Try it out now!¶
The following quick start briefly summarizes the necessary steps to use our pipeline:
- Install the necessary tools (Snakemake, samtools, bedtools, and Subread).
Note
Note that all tools require Python 3.
We recommend installing them via conda, in which case the installation is as easy as
conda install -c bioconda snakemake bedtools samtools subreadIf conda is not yet installed, follow the installation instructions. Installation is quick and easy.
Note
You do not need to uninstall other Python installations or packages in order to use conda. Even if you already have a system Python, another Python installation from a source such as the macOS Homebrew package manager and globally installed packages from pip such as pandas and NumPy, you do not need to uninstall, remove, or change any of them before using conda.
If you want to install the tools manually and outside of the conda framework, see the following instructions for each of the tools: snakemake, samtools, bedtools, Subread.
Clone the Git repository:
git clone https://git.embl.de/grp-zaugg/diffTF
To run the example analysis for 50 TF, simply perform the following steps:
Change into the
example/input
directory within the Git repositorycd diffTF/example/input
Download the data via the download script
sh downloadAllData.shTo test if the setup is correct, start a dryrun via the first helper script
sh startAnalysisDryRun.shOnce the dryrun is successful, start the analysis via the second helper script
sh startAnalysis.sh
- To run your own analysis, modify the files
config.json
andsampleData.ts
. See the instructions in the section Run your own analysis for more details. - If your analysis finished successfully, take a look into the
FINAL_OUTPUT
folder within your specified output directory, which contains the summary tables and visualization of your analysis. If you received an error, take a look in Section Handling errors to troubleshoot.
Prerequisites¶
This section lists the required software and how to install them. As outlined in Section Try it out now!, the easiest way is to install all of them via conda
. However, it is of course also possible to install the tools separately.
Snakemake¶
Please ensure that you have at least version 4.3 installed. Principally, there are multiple ways to install Snakemake. We recommend installing it, along with all the other required software, via conda.
samtools, bedtool*s, *Subread¶
In addition, samtools, bedtools and Subread are needed to run diffTF. We recommend installing them, along with all the other required software, via conda.
R and R packages¶
A working R
installation is needed and a number of packages from either CRAN or Bioconductor have to be installed. Type the following in R
to install them:
install.packages(c("checkmate", "futile.logger", "tidyverse", "reshape2", "gridExtra", "scales", "jsonlite", "RcolorBrewer", "rlist", "ggrepel", "lsr", "modeest", "locfdr", "boot"))
source("https://bioconductor.org/biocLite.R")
biocLite(c("limma", "vsn", "csaw", "DESeq2", "DiffBind", "geneplotter", "Rsamtools"))
Run your own analysis¶
Running your own analysis is almost as easy as running the example analysis. Carefully read and follow the following steps and notes:
- Copy the files
config.json
andstartAnalysis.sh
to a directory of your choice. - Modify the file
config.json
accordingly. For example, we strongly recommend running the analysis for all TF instead of just 50 as for the example analysis. For this, simply change the parameter “TFs” to “all”. See Section General configuration file for details about the meaning of the parameters. Do not delete or rename any parameters or sections. - Create a tab-separated file that defines the input data, in analogy to the file
sampleData.tsv
from the example analysis, and refer to that in the fileconfig.json
(parametersummaryFile
) - Adapt the file
startAnalysis.sh
if necessary (the exact command line call to Snakemake and the various Snakemake-related parameters) - Since running the pipeline might be computationally demanding, read Section Executing diffTF - Running times and memory requirements and decide on which machine to run the pipeline. In most cases, we recommend running diffTF in a cluster environment. The pipeline is written in Snakemake, and we strongly suggest to also read Section Working with diffTF and FAQs to get a basic understanding of how the pipeline works.