-
Tobias Marschall authoredTobias Marschall authored
README.md 2.56 KiB
Strand-seq pipeline
Ongoing work
Preliminary SV calling using Strand-seq data - summarized in a Snakemake pipeline.
Bioconda environment
To install the correct environment, you can use Bioconda.
-
Install MiniConda: In case you do not have Conda yet, it is easiest to just install MiniConda.
-
Create environment:
conda env create -n strandseqnation -f conda-environment.yml source activate strandseqnation
That's it, you are ready to go.
How to use it
- Install required software:
* Install [mosaicatcher](https://github.com/friendsofstrandseq/mosaicatcher) (*currently you will need the `develop` branch*)
* Get the R-scripts from [strandsequtils](https://github.com/friendsofstrandseq/strandsequtils)
* Install BSgenome.Hsapiens.UCSC.hg38 (can be skipped of you use the Bioconda environment, see above):
```
source("https://bioconductor.org/biocLite.R")
biocLite('BSgenome.Hsapiens.UCSC.hg38')
```
* [Strand-Phaser](https://github.com/daewoooo/StrandPhaseR) is installed automatically
- Set up the configuration of the snakemake pipeline
* Open `Snake.config.json` and specify the path to the executatables
(such as Mosaicatcher) and to the R scripts.
* Create a subdirectory `bam/` and another subdirectory per sample (e.g.
`bam/NA12878/`). **Multiple samples can be run together not**.
Then copy (or soft-link) the Strand-seq single-cell libraries (one BAM
file per cell) in there. Note that bam files need to be sorted and indexed,
contain a read group and should have duplicates marked.
- Run Snakemake
* run `snakemake` to compute all tasks locally
* Alternatively, you can ask Snakemake to submit your jobs to a HPC cluster. To this end edit the `Snake.cluster.json` file according to your available HPC environment and call
```
snakemake -j 100 \
--cluster-config Snake.cluster.json \
--cluster "???"
```
SNV calls
The pipeline will run simple SNV calling using samtools
and bcftools. If you already have
SNV calls, you can avoid that by entering your VCF files into the pipeline.
To so, make sure the files are tabix-indexed
and specifigy them inside the Snake.config.json
file:
"snv_calls" : {
"NA12878" : "path/to/snp/calls.vcf.gz"
},