Try it out now!¶
diffTF runs on Linux and macOS. The following quick start briefly summarizes the necessary steps to use our pipeline:
- Install the necessary tools (Snakemake, samtools, bedtools, and Subread).
Note
Note that all tools require Python 3.
We recommend installing them via conda, in which case the installation then becomes as easy as
conda config --add channels defaults conda config --add channels conda-forge conda config --add channels bioconda conda install snakemake bedtools samtools subreadIf conda is not yet installed, follow the installation instructions. Installation is quick and easy. Make sure to open a new terminal after installation, so that conda is available.
Note
You do not need to uninstall other Python installations or packages in order to use conda. Even if you already have a system Python, another Python installation from a source such as the macOS Homebrew package manager and globally installed packages from pip such as pandas and NumPy, you do not need to uninstall, remove, or change any of them before using conda.
If you want to install the tools manually and outside of the conda framework, see the following instructions for each of the tools: snakemake, samtools, bedtools, Subread.
Clone the Git repository:
git clone https://git.embl.de/grp-zaugg/diffTF
If you receive an error, Git may not be installed on your system. If you run Ubuntu, try the following command:
sudo apt-get install git
For macOS, there are multiple ways of installing it. If you already have Homebrew (http://brew.sh) installed, simply type:
brew install git
Otherwise, consult the internet on how to best install Git for your system.
To run the example analysis for 50 TF, simply perform the following steps:
Change into the
example/input
directory within the Git repositorycd diffTF/example/input
Download the data via the download script
sh downloadAllData.shTo test if the setup is correct, start a dryrun via the first helper script
sh startAnalysisDryRun.shOnce the dryrun is successful, start the analysis via the second helper script
sh startAnalysis.sh
- To run your own analysis, modify the files
config.json
andsampleData.ts
. See the instructions in the section Run your own analysis for more details. - If your analysis finished successfully, take a look into the
FINAL_OUTPUT
folder within your specified output directory, which contains the summary tables and visualization of your analysis. If you received an error, take a look in Section Handling errors to troubleshoot.
Prerequisites¶
This section lists the required software and how to install them. As outlined in Section Try it out now!, the easiest way is to install all of them via conda
. However, it is of course also possible to install the tools separately.
Snakemake¶
Please ensure that you have at least version 4.3 installed. Principally, there are multiple ways to install Snakemake. We recommend installing it, along with all the other required software, via conda.
samtools, bedtools, Subread¶
In addition, samtools, bedtools and Subread are needed to run diffTF. We recommend installing them, along with all the other required software, via conda.
R and R packages¶
A working R
installation is needed and a number of packages from either CRAN or Bioconductor have to be installed. Type the following in R
to install them:
install.packages(c("checkmate", "futile.logger", "tidyverse", "reshape2", "RColorBrewer", "ggrepel", "lsr", "modeest", "boot", "grDevices", "pheatmap", "matrixStats", "locfdr"))
source("https://bioconductor.org/biocLite.R")
biocLite(c("limma", "vsn", "csaw", "DESeq2", "DiffBind", "geneplotter", "Rsamtools"))
Run your own analysis¶
Running your own analysis is almost as easy as running the example analysis. Carefully read and follow the following steps and notes:
- Copy the files
config.json
andstartAnalysis.sh
to a directory of your choice. - Modify the file
config.json
accordingly. For example, we strongly recommend running the analysis for all TF instead of just 50 as for the example analysis. For this, simply change the parameter “TFs” to “all”. See Section General configuration file for details about the meaning of the parameters. Do not delete or rename any parameters or sections. - Create a tab-separated file that defines the input data, in analogy to the file
sampleData.tsv
from the example analysis, and refer to that in the fileconfig.json
(parametersummaryFile
) - Adapt the file
startAnalysis.sh
if necessary (the exact command line call to Snakemake and the various Snakemake-related parameters) - Since running the pipeline is often computationally demanding, read Section Executing diffTF - Running times and memory requirements and decide on which machine to run the pipeline. In most cases, we recommend running diffTF in a cluster environment (see Section Running diffTF in a cluster environment for details). The pipeline is written in Snakemake, and we strongly suggest to also read Section Working with diffTF and FAQs to get a basic understanding of how the pipeline works.