Commit f4c554ce authored by Paul Igor Costea's avatar Paul Igor Costea
Browse files

Update README.md to change SNP to SNV

parent 2550e85f
# MetaSNP, a metagenomic SNP calling pipeline
The metaSNP pipeline performs variant calling on aligned metagenomic samples and enables species delineation with sub-species resolution.
The metaSNV pipeline performs variant calling on aligned metagenomic samples and enables species delineation with sub-species resolution.
Download
......@@ -19,7 +19,7 @@ Dependencies
* Boost-1.53.0 or above
* samtools-1.19 or above
* htslib
* Python-2.7 or above
......@@ -52,12 +52,12 @@ II. Compilation:
III. Environmental Variables:
-----------------------------
We assume the metaSNP parent directory, `samtools` and `python` are present in your PATH.
We assume the metaSNV parent directory, `samtools` and `python` are present in your PATH.
You can set this variable temporarily by running the following commands and
permanently by putting these line into your .bashrc:
export PATH=/path/2/metaSNP:$PATH
export PATH=/path/2/metaSNV:$PATH
Note: Replace '/path/2' with the corresponding global path.
......@@ -74,7 +74,7 @@ Workflow:
## 1. Initiate a new project
metaSNP_NEW project_dirname
metaSNV_NEW project_dirname
Generates a structured results directory for your project.
......@@ -82,27 +82,27 @@ Generates a structured results directory for your project.
Note: Subsequent SNP filtering depends on these coverage estimations.
This part can be skipped if you only want to use the 'raw' SNP output and do not intend to balance the workload or if you already performed the pre-processing for the dataset.
### a) run metaSNP_COV
### a) run metaSNV_COV
metaSNP_COV project_dirname all_samples
metaSNV_COV project_dirname all_samples
The script generates a list of commandline jobs. Run each commandline jobs
before proceeding with the next step.
### b) run metaSNP_OPT
### b) run metaSNV_OPT
Helper script for workload balancing (reference genome splitting).
metaSNP_OPT project_dir/ genome_def nr_splits[int]
metaSNV_OPT project_dir/ genome_def nr_splits[int]
Note: This step requires an appropriate reference genome_definitions file (included in our database).
## 3. Part II: SNP calling
## 3. Part II: SNV calling
Note: Can be run as a single job, independently of the pre-processing or you use an already existing genome split.
### a) metaSNP_SNP
### a) metaSNV_SNV
metaSNP_SNP project_dir/ all_samples ref_db [options]
metaSNV_SNV project_dir/ all_samples ref_db [options]
The script generates a list of commandline jobs. Submit each commandline to
your HPC cluster or compute locally.
......@@ -154,7 +154,7 @@ Example Tutorial
## 3. Initiate a new project in the parent directory
$ metaSNP_New tutorial
$ metaSNV_New tutorial
## 4. Generate the 'all_samples' file
......@@ -162,23 +162,23 @@ Example Tutorial
## 5. Prepare and run the coverage estimation
$ metaSNP_COV tutorial/ tutorial/all_samples > runCoverage
$ metaSNV_COV tutorial/ tutorial/all_samples > runCoverage
$ bash runCoverage
## 6. Perform a work load balancing step for run time optimization.
$ metaSNP_OPT tutorial/ db/Genomev9_definitions 5
$ metaSNV_OPT tutorial/ db/Genomev9_definitions 5
$ bash runCoverage
## 7. Prepare and run the SNP calling step
## 7. Prepare and run the SNV calling step
$ metaSNP_SNP tutorial/ tutorial/all_samples db/RepGenomesv9.fna -a db/RefOrganismDB_v9_gene.clean -l tutorial/bestsplits/ > runSNPcall
$ metaSNV_SNP tutorial/ tutorial/all_samples db/RepGenomesv9.fna -a db/RefOrganismDB_v9_gene.clean -l tutorial/bestsplits/ > runSNPcall
$ bash runSNPcall
## 8. Run the post processing / filtering steps
### a) Compute allele frequencies for each position that pass the given thresholds.
$ metaSNP_filtering.py tutorial/tutorial.all_perc.tab tutorial/tutorial.all_cov.tab tutorial/snpCaller/called_SNPs.best_split_* tutorial/all_samples tutorial/filtered/pop/
$ metaSNV_filtering.py tutorial/tutorial.all_perc.tab tutorial/tutorial.all_cov.tab tutorial/snpCaller/called_SNVs.best_split_* tutorial/all_samples tutorial/filtered/pop/
### b) Compute pair-wise distances between samples on their SNP profiles and create a PCoA plot.
......@@ -189,31 +189,31 @@ Advanced usage (tools and scripts)
==================================
If you are interested in using the pipeline in a more manual way (for example
the metaSNP caller stand alone), you will find the executables for the
the metaSNV caller stand alone), you will find the executables for the
individual steps in the `src/` directory.
You will find scripts as well as the binaries for qaCompute and the metaSNP
caller in their corresponding directories (src/qaCompute and src/snpCaller)
post compilation.
metaSNP caller
metaSNV caller
--------------
Calls SNPs from samtools pileup format and generates two outputs.
Calls SNVs from samtools pileup format and generates two outputs.
usage: ./snpCall [options] < stdin.mpileup > std.out.popSNPs
Options:
-f, faidx indexed reference genome.
-g, gene annotation file.
-i, individual SNPs.
-i, individual SNVs.
Note: Expecting samtools mpileup as standard input
### __Output__
1. Population SNPs (pSNPs):
1. Population SNVs (pSNVs):
Population wide variants that occur with a frequency of 1 % at positions with at least 4x coverage.
2. Individual specific SNPs (iSNPs):
2. Individual specific SNVs (iSNVs):
Non population variants, that occur with a frequency of 10 % at positions with at least 10x coverage.
......@@ -257,14 +257,14 @@ Also counts unmapped and sub-par quality reads.
For more info on the parameteres try ./qaCompute
metaSNP_filtering.py
metaSNV_filtering.py
--------------------
usage: metaSNP filtering [-h] [-p PERC] [-c COV] [-m MINSAMPLES] [-s SNPC]
usage: metaSNV filtering [-h] [-p PERC] [-c COV] [-m MINSAMPLES] [-s SNPC]
[-i SNPI]
perc_FILE cov_FILE snp_FILE [snp_FILE ...]
all_samples output_dir/
metaSNP filtering
metaSNV filtering
positional arguments:
perc_FILE input file with horizontal genome (taxon) coverage
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment