README.md 3.94 KB
Newer Older
Paul Costea's avatar
moveing  
Paul Costea committed
1 2 3 4 5 6 7 8 9 10 11
# MetaSNV, a metagenomic SNV calling pipeline


The metaSNV pipeline performs variant calling on aligned metagenomic samples.


Download
========

Via Git:

Paul Igor Costea's avatar
Paul Igor Costea committed
12
    git clone git@git.embl.de:costea/metaSNV.git
Paul Costea's avatar
moveing  
Paul Costea committed
13 14 15 16 17 18 19 20 21 22 23
    
or [download](https://git.embl.de/rmuench/metaSNP/repository/archive.zip?ref=master) a zip file of the repository.

Dependencies
============

* [Boost-1.53.0 or above](http://www.boost.org/users/download/)

* [htslib](http://www.htslib.org/)
 
* Python-2.7 or above
24 25
    * numpy
    * pandas
Paul Costea's avatar
moveing  
Paul Costea committed
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68

#### Installing dependencies on Ubuntu/debian

On an Ubuntu/debian system, the following sequence of commands will install all
required packages (the first two are only necessary if you have not enabled the
universe repository before):


    sudo add-apt-repository "deb http://archive.ubuntu.com/ubuntu $(lsb_release -sc) universe"
    sudo apt-get update

    sudo apt-get install libhts-dev libboost-dev

### Installing dependencies using anaconda

If you use [anaconda](https://www.continuum.io/downloads), you can create an
environment with all necessary dependencies using the following commands:

    conda create --name metaSNV boost htslib pkg-config
    source activate metaSNV
    export CFLAGS=-I$CONDA_ENV_PATH/include
    export LD_LIBRARY_PATH=$CONDA_ENV_PATH/lib:$LD_LIBRARY_PATH

If you do not have a C++ compiler, anaconda can also install G++:

    conda create --name metaSNV boost htslib pkg-config
    source activate metaSNV

    # Add this command:
    conda install gcc

    export CFLAGS=-I$CONDA_ENV_PATH/include
    export LD_LIBRARY_PATH=$CONDA_ENV_PATH/lib:$LD_LIBRARY_PATH

Setup & Compilation
===================

    make

Workflow:
=========
## Required Files:

Paul Costea's avatar
Paul Costea committed
69 70 71
* **'all\_samples'**  = a list of all BAM files, one /path/2/sample.bam per line (no duplicates)
* **'ref\_db'**       = the reference database in fasta format (f.i. multi-sequence fasta)
* **'gen\_pos'**      = a list with start and end positions for each sequence in the reference (format: `sequence\_id  start end`)
Paul Costea's avatar
moveing  
Paul Costea committed
72 73

## Optional Files:
Paul Costea's avatar
Paul Costea committed
74
* **'db\_ann'** = a gene annotation file for the reference database (format: ).
Paul Costea's avatar
moveing  
Paul Costea committed
75 76 77 78 79 80 81

## To use one of the provided reference databases:

    ./getRefDB.sh
    
## 2. Run metaSNV

Paul Costea's avatar
Paul Costea committed
82
    metaSNV.py project_dir/ all_samples ref_db [options]
Paul Costea's avatar
moveing  
Paul Costea committed
83 84 85 86

## 3. Part II: Post-Processing (Filtering & Analysis)
Note: requires SNP calling (Part II) to be done!

Paul Costea's avatar
Paul Costea committed
87
    metaSNV_post.py project_dir [options]
Paul Costea's avatar
moveing  
Paul Costea committed
88 89 90 91 92 93 94 95 96 97 98 99 100

Example Tutorial
================

## 1. Run the setup & compilation steps and download the provided reference database.

    ./getRefDB.sh

## 2. Go to the EXAMPLE directory and download the samples with the getSamplesScript.sh

    $ cd EXAMPLE
    $ ./getSamplesScript.sh

Paul Costea's avatar
Paul Costea committed
101
## 3. Make sample list
Paul Costea's avatar
moveing  
Paul Costea committed
102

Paul Costea's avatar
Paul Costea committed
103
    $ find `pwd`/EXAMPLE/samples -name “*.bam” > sample_list
Paul Costea's avatar
moveing  
Paul Costea committed
104

Paul Costea's avatar
Paul Costea committed
105
## 4. Run the SNV calling step
Paul Costea's avatar
moveing  
Paul Costea committed
106

Paul Costea's avatar
Paul Costea committed
107
    $ python metaSNV.py tutorial sample_list db/freeze9.genomes.RepGenomesv9.fna --threads 8 --ctg_len db/freeze9.len.def.bed
Paul Costea's avatar
moveing  
Paul Costea committed
108

Paul Costea's avatar
Paul Costea committed
109
## 5. Run filtering and post processing
Paul Costea's avatar
moveing  
Paul Costea committed
110

Paul Costea's avatar
Paul Costea committed
111
    $ python metaSNV_post.py tutorial
Paul Costea's avatar
moveing  
Paul Costea committed
112
    
Paul Costea's avatar
Paul Costea committed
113
    Voila! Your distances will be in the tutorial/distances folder. Enjoy!
Paul Costea's avatar
moveing  
Paul Costea committed
114

Paul Costea's avatar
Paul Costea committed
115
Advanced usage 
Paul Costea's avatar
moveing  
Paul Costea committed
116 117
==================================

Paul Costea's avatar
Paul Costea committed
118 119 120 121
If you want to run a lot of samples and would like to use the power of your cluster, we will print out the commands you need to
run and you can decide on how to schedule and manage them.

## 1. Get the first set of commands
Paul Costea's avatar
moveing  
Paul Costea committed
122
    
Paul Costea's avatar
Paul Costea committed
123
    $ python metaSNV.py tutorial sample_list db/freeze9.genomes.RepGenomesv9.fna --n_splits 8 --ctg_len db/freeze9.len.def.bed --print-commands
Paul Costea's avatar
moveing  
Paul Costea committed
124
    
Paul Costea's avatar
Paul Costea committed
125 126 127 128 129
    Note the addition of the "--print-commnads". This will print out one-liners that you need to run. When done, run same again.

## 2. Get the second set of commands
 
    $ python metaSNV.py tutorial sample_list db/freeze9.genomes.RepGenomesv9.fna --n_splits 8 --ctg_len db/freeze9.len.def.bed --print-commands
Paul Costea's avatar
moveing  
Paul Costea committed
130
    
Paul Costea's avatar
Paul Costea committed
131
    This will calculate the "load balancing" and give you the commands for running the SNV calling.
Paul Costea's avatar
moveing  
Paul Costea committed
132
    
Paul Costea's avatar
Paul Costea committed
133
## 3. Run post-processing as usual
Paul Costea's avatar
moveing  
Paul Costea committed
134

Paul Costea's avatar
Paul Costea committed
135
    $ python metaSNV_post.py tutorial
Paul Costea's avatar
moveing  
Paul Costea committed
136