Skip to content
Snippets Groups Projects
user avatar
Sascha Meiers authored
71130e87
History

mosaicatcher

MosaiCatcher pipeline

Structural variant calling from single-cell Strand-seq data - summarized in a Snakemake pipeline.

For Info on Strand-seq see

Overview of this workflow

This workflow uses Snakemake to execute all steps of MosaiCatcher in order. The starting point are single-cell BAM files from Strand-seq experiments and the final output are SV predictions in a tabular format as well as in a graphical representation. To get to this point, the workflow goes through the following steps:

  1. Read binning in fixed-width genomic windows of 100kb via mosaicatcher
  2. Normalization of coverage with respect to a reference sample (included)
  3. Strand state detection (included)
  4. Haplotype resolution via StrandPhaseR
  5. Multi-variate segmentation of cells (mosaicatcher)
  6. Bayesian classification of segmentation to find SVs using mosaiClassifier (included)
  7. Visualization of results using custom R plots (included)

Installation

Choose one of three ways to install and run this workflow:

  1. Install software using Bioconda

    • Installation instructions here
    • Configure Snake.conf.json according to your installtion
    • Add your single-cell data according to the specificaitons given below (Setup)
  2. Run Snakemake together with a Singularity image

    • Instructions here
    • Requires Snakemake and Singularity. No further installations required
    • Add your single-cell data according to the specificaitons given below
  3. Run a complete example data set via Docker

    • Requires Docker (tested in version 18.09)
    • Includes a whole data set of 96 RPE-1 cells
    • Example shown here

Setup

  • Download this pipeline

     git clone https://github.com/friendsofstrandseq/pipeline
     cd pipeline
  • Add your single-cell data

    Create a subdirectory bam/sampleName/. Your Strand-seq BAM files of this sample go into two folders:

    • all/for the total set of BAM files
    • selected/ for the subset of successful Strand-seq libraries (possibly hard-linked to all/)

    It is important to follow these rules for single-cell data

    • One BAM file per cell
    • Sorted and indexed
    • Timestamp of index files must be newer than of the BAM files
    • Each BAM file must contain a read group (@RG) with a common sample name (SM), which must match the folder name (sampleName above)
  • Adapt the config file

    In Snake.conf.json you can specify

  • SNP call set, if available If available, specify SNV calls (VCF) in Snake.config.json. Note that the sample name in the VCF must match the one in the BAM files.

Note: Multiple samples can be run simultaneously. Just create different subfolders below bam/. The same settings from the Snake.config.json config files are applied to all samples.

SNP calls

The pipeline will run simple SNV calling using samtools and bcftools on Strand-seq. If you already have SNV calls, you can avoid that by entering your VCF files into the pipeline. To so, make sure the files are tabix-indexed and specifigy them inside the Snake.config.json file:

"snv_calls"     : {
      "NA12878" : "path/to/snp/calls.vcf.gz"
  },

Installation using Singularity/Docker

Will be updated soon.