Try it out now!

The following quick start briefly summarizes the necessary steps to use our pipeline:

  1. Install the necessary tools (Snakemake, samtools, bedtools, and Subread).

Note

Note that all tools require Python 3.

We recommend installing them via conda, in which case the installation is as easy as

conda install -c bioconda snakemake bedtools samtools subread

If conda is not yet installed, follow the installation instructions. Installation is quick and easy.

Note

You do not need to uninstall other Python installations or packages in order to use conda. Even if you already have a system Python, another Python installation from a source such as the macOS Homebrew package manager and globally installed packages from pip such as pandas and NumPy, you do not need to uninstall, remove, or change any of them before using conda.

If you want to install the tools manually and outside of the conda framework, see the following instructions for each of the tools: snakemake, samtools, bedtools, Subread.

  1. Clone the Git repository:

    git clone https://git.embl.de/grp-zaugg/diffTF
    
  2. To run the example analysis for 50 TF, simply perform the following steps:

  • Change into the example/input directory within the Git repository

    cd diffTF/example/input
    
  • Download the data via the download script

    sh downloadAllData.sh
    
  • To test if the setup is correct, start a dryrun via the first helper script

    sh startAnalysisDryRun.sh
    
  • Once the dryrun is successful, start the analysis via the second helper script

    sh startAnalysis.sh
    
  1. To run your own analysis, modify the files config.json and sampleData.ts. See the instructions in the section Run your own analysis for more details.
  2. If your analysis finished successfully, take a look into the FINAL_OUTPUT folder within your specified output directory, which contains the summary tables and visualization of your analysis. If you received an error, take a look in Section Handling errors to troubleshoot.

Prerequisites

This section lists the required software and how to install them. As outlined in Section Try it out now!, the easiest way is to install all of them via conda. However, it is of course also possible to install the tools separately.

Snakemake

Please ensure that you have at least version 4.3 installed. Principally, there are multiple ways to install Snakemake. We recommend installing it, along with all the other required software, via conda.

samtools, bedtool*s, *Subread

In addition, samtools, bedtools and Subread are needed to run diffTF. We recommend installing them, along with all the other required software, via conda.

R and R packages

A working R installation is needed and a number of packages from either CRAN or Bioconductor have to be installed. Type the following in R to install them:

install.packages(c("checkmate", "futile.logger", "tidyverse", "reshape2", "gridExtra", "scales", "jsonlite", "RcolorBrewer", "rlist", "ggrepel", "lsr", "modeest", "locfdr", "boot"))
source("https://bioconductor.org/biocLite.R")
biocLite(c("limma", "vsn", "csaw", "DESeq2", "DiffBind", "geneplotter", "Rsamtools"))

Run your own analysis

Running your own analysis is almost as easy as running the example analysis. Carefully read and follow the following steps and notes:

  1. Copy the files config.json and startAnalysis.sh to a directory of your choice.
  2. Modify the file config.json accordingly. For example, we strongly recommend running the analysis for all TF instead of just 50 as for the example analysis. For this, simply change the parameter “TFs” to “all”. See Section General configuration file for details about the meaning of the parameters. Do not delete or rename any parameters or sections.
  3. Create a tab-separated file that defines the input data, in analogy to the file sampleData.tsv from the example analysis, and refer to that in the file config.json (parameter summaryFile)
  4. Adapt the file startAnalysis.sh if necessary (the exact command line call to Snakemake and the various Snakemake-related parameters)
  5. Since running the pipeline might be computationally demanding, read Section Executing diffTF - Running times and memory requirements and decide on which machine to run the pipeline. In most cases, we recommend running diffTF in a cluster environment. The pipeline is written in Snakemake, and we strongly suggest to also read Section Working with diffTF and FAQs to get a basic understanding of how the pipeline works.