Commit 7c3e211d authored by Christian Arnold's avatar Christian Arnold
Browse files

Cleanup docs directory

parent fc3dff29
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: c061cd756f79b7c0da356056dfd19556
tags: 645f666f9bcd5a90fca523b33c5a78b7
.. _docs-quickstart:
Try it out now!
============================================================
*diffTF* runs on Linux and macOS and is even independent on the operating system if combined with ``Singularity``. The following quick start briefly summarizes the necessary steps to install and use it.
Principally, there are two ways of installing *diffTF* and the proper tools:
1a. The "easy" way: Using ``Singularity`` and our preconfigured *diffTF* containers that contain all necessary tools, R, and R libraries
You only need to install Snakemake (see below for details) and ``Singularity``. Snakemake supports Singularity in Versions >=2.4. You can check whether you already have ``Singularity`` installed by simply typing
.. code-block:: Bash
singularity --version
Snakemake requires at least version 2.4. If your version is below, please update to the latest ``Singularity`` version.
.. note:: Make to read the section :ref:`docs-singularityNotes` properly!
1b. The "more complicated" way: Install the necessary tools (*Snakemake*, *samtools*, *bedtools*, *Subread*, and *R* along with various packages).
.. note:: Note that all tools require Python 3.
We recommend installing all tools except R via conda, in which case the installation then becomes as easy as
.. code-block:: Bash
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install snakemake bedtools samtools subread
If conda is not yet installed, follow the `installation instructions <https://conda.io/docs/user-guide/install/index.html>`_. Installation is quick and easy. Make sure to open a new terminal after installation, so that *conda* is available.
.. note:: You do not need to uninstall other Python installations or packages in order to use conda. Even if you already have a system Python, another Python installation from a source such as the macOS Homebrew package manager and globally installed packages from pip such as pandas and NumPy, you do not need to uninstall, remove, or change any of them before using conda.
If you want to install the tools manually and outside of the conda framework, see the following instructions for each of the tools: `snakemake <http://snakemake.readthedocs.io/en/stable/getting_started/installation.html>`_, `samtools <http://www.htslib.org/download>`_, `bedtools <http://bedtools.readthedocs.io/en/latest/content/installation.html>`_, `Subread <http://subread.sourceforge.net>`_.
In addition, *R* is needed along with various packages (see below for details).
2. Clone the Git repository:
.. code-block:: Bash
git clone https://git.embl.de/grp-zaugg/diffTF
If you receive an error, *Git* may not be installed on your system. If you run Ubuntu, try the following command:
.. code-block:: Bash
sudo apt-get install git
For macOS, there are multiple ways of installing it. If you already have *Homebrew* (http://brew.sh) installed, simply type:
.. code-block:: Bash
brew install git
Otherwise, consult the internet on how to best install Git for your system.
3. To run the example analysis for 50 TF, simply perform the following steps:
* Change into the ``example/input`` directory within the Git repository
.. code-block:: Bash
cd diffTF/example/input
* Download the data via the download script
.. code-block:: Bash
sh downloadAllData.sh
* To test if the setup is correct, start a dryrun via the first helper script
.. code-block:: Bash
sh startAnalysisDryRun.sh
* Once the dryrun is successful, start the analysis via the second helper script. If you want to include ``Singularity`` (which we strongly recommend), simply edit the file and add the ``--use-singularity`` command line argument in addition to the other arguments (see the Snakemake documentation and the section :ref:`docs-singularityNotes` for more details).
.. code-block:: Bash
sh startAnalysis.sh
4. To run your own analysis, modify the files ``config.json`` and ``sampleData.ts``. See the instructions in the section `Run your own analysis`_ for more details.
5. If your analysis finished successfully, take a look into the ``FINAL_OUTPUT`` folder within your specified output directory, which contains the summary tables and visualization of your analysis. If you received an error, take a look in Section :ref:`docs-errors` to troubleshoot.
.. _docs-prerequisites:
Prerequisites for the "easy" way
==================================
The only prerequisite here is that Snakemake and ``Singularity`` must be installed on the system you want to run *diffTF*. See above for details with respect to the supported versions etc. For details how to install Snakemake, see below.
Prerequisites for the "manual" way
=====================================
Note that most of this section is only relevant if you use Snakemake without ``Singularity``. This section lists the required software and how to install them. As outlined in Section :ref:`docs-quickstart`, the easiest way is to install all of them via ``conda``. However, it is of course also possible to install the tools separately.
Snakemake
--------------------------
Please ensure that you have at least version 5.3 installed. Principally, there are `multiple ways to install Snakemake <http://snakemake.readthedocs.io/en/stable/getting_started/installation.html>`_. We recommend installing it, along with all the other required software, via conda.
*samtools*, *bedtools*, *Subread*
----------------------------------
In addition, `samtools <http://www.htslib.org/download>`_, `bedtools <http://bedtools.readthedocs.io>`_ and `Subread <http://subread.sourceforge.net>`_ are needed to run *diffTF*. We recommend installing them, along with all the other required software, via conda.
R and R packages
--------------------------
A working ``R`` installation is needed and a number of packages from either CRAN or Bioconductor have to be installed. Type the following in ``R`` to install them:
.. code-block:: R
install.packages(c("checkmate", "futile.logger", "tidyverse", "reshape2", "RColorBrewer", "ggrepel", "lsr", "modeest", "boot", "grDevices", "pheatmap", "matrixStats", "locfdr", "pheatmap"))
source("https://bioconductor.org/biocLite.R")
biocLite(c("limma", "vsn", "csaw", "DESeq2", "DiffBind", "geneplotter", "Rsamtools"))
.. _docs-runOwnAnalysis:
Run your own analysis
============================================================
Running your own analysis is almost as easy as running the example analysis. Carefully read and follow the following steps and notes:
1. Copy the files ``config.json`` and ``startAnalysis.sh`` to a directory of your choice.
2. Modify the file ``config.json`` accordingly. For example, we strongly recommend running the analysis for all TF instead of just 50 as for the example analysis. For this, simply change the parameter “TFs” to “all”. See Section :ref:`configurationFile` for details about the meaning of the parameters. Do not delete or rename any parameters or sections.
3. Create a **tab-separated** file that defines the input data, in analogy to the file ``sampleData.tsv`` from the example analysis, and refer to that in the file ``config.json`` (parameter ``summaryFile``)
4. Adapt the file ``startAnalysis.sh`` if necessary (the exact command line call to Snakemake and the various Snakemake-related parameters). If you run with Singularity, see the section below for modifications.
5. Since running the pipeline is often computationally demanding, read Section :ref:`timeMemoryRequirements` and decide on which machine to run the pipeline. In most cases, we recommend running *diffTF* in a cluster environment (see Section :ref:`clusterEnvironment` for details). The pipeline is written in Snakemake, and we strongly suggest to also read Section :ref:`workingWithPipeline` to get a basic understanding of how the pipeline works.
.. _docs-singularityNotes:
Adaptations and notes when running with Singularity
============================================================
You only have to add the ``--use-singularity`` argument to Snakemake. In that case, each rule will be executed in pre-configured isolated containers that contain all necessary tools. Please note the following important issues related to ``Singularity``:
- You may want to add the ``--singularity-prefix`` argument to store all ``Singularity`` containers in a central place as opposed to being stored in the local ``.snakemake`` directory. If you intend to run multiple *diffTF* analyses in different locations, you can save space and time because the containers won't have to be downloaded each time and stored in multiple locations
- .. warning:: If you use ``Singularity`` version 3, make sure you have at least version 3.0.2 installed or the latest pull from version 3.0.1, as there was an issue with Snakemake and particular ``Singularity`` versions. For more details, see `here <https://bitbucket.org/snakemake/snakemake/issues/1017/snakemake-process-suspended-upon-execution>`_.
- .. warning:: If you reference files in the ``config.json`` that are located outside of the directory from which you call Snakemake (that is, parent directories), you have to use the ``--singularity-args`` command line argument to bind additional parent directories to the container so they are available inside the container as well. Otherwise, only (!) the directory from which you start the analysis and subfolders are visible inside the container, but no parent folders. In addition, make sure the mounted paths are identical inside and outside the container. For example, if you reference the files ``/g/group1/user1/mm10.fa`` and ``/g/group2/user1/files/bla.txt`` in the config file, use ``--singularity-args "--bind /g:/g"``. Thus, you have to mount the parent directory of all files you reference outside of your current directory. If both files were located in ``/g/group1``, you could therefore also use ``--bind /g/group1:/g/group1``
This diff is collapsed.
.. _docs-details:
Input
======
Workflow
=========
Working with the pipeline
=========================
Handling errors
===============
Output
======
.. diffTF documentation master file, created by
sphinx-quickstart on Thu Nov 30 13:16:14 2017.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
=========================================
Welcome to the documentation of *diffTF*!
=========================================
Welcome to the *diffTF* documentation, and thank you for the interest in our software! These pages provide documentation and additional information for the *diffTF* pipeline.
To get yourself oriented, check the menu on the left or search what you are looking for in the search field in the upper left corner.
This site is organized into the following three parts:
.. toctree::
:maxdepth: 2
:caption: Quick Start and Installation
chapter1.rst
.. toctree::
:maxdepth: 2
:caption: Pipeline Details
chapter2.rst
.. toctree::
:maxdepth: 2
:caption: Project Information
projectInfo.rst
.. _docs-project:
Biological motivation
============================
Transcription factor (TF) activity constitutes an important readout of cellular signalling pathways and thus for assessing regulatory differences across conditions. However, current technologies lack the ability to simultaneously assessing activity changes for multiple TFs and surprisingly little is known about whether a TF acts as repressor or activator. To this end, we introduce the widely applicable genome-wide method diffTF to assess differential TF binding activity and classifying TFs as activator or repressor by integrating any type of genome-wide chromatin with RNA-Seq data and in-silico predicted TF binding sites.
For a graphical summary of the idea, see the section :ref:`workflow`
We also put the paper on *bioRxiv*, please read all methodological details here:
`Quantification of differential transcription factor activity and multiomic-based classification into activators and repressors: diffTF <https://www.biorxiv.org/content/early/2018/07/13/368498>`_.
Help, contribute and contact
============================
If you have questions or comments, feel free to contact us. We will be happy to answer any questions related to this project as well as questions related to the software implementation. For method-related questions, contact Judith B. Zaugg (judith.zaugg@embl.de) or Ivan Berest (berest@embl.de). For technical questions, contact Christian Arnold (christian.arnold@embl.de).
If you have questions, doubts, ideas or problems, please use the `Bitbucket Issue Tracker <https://bitbucket.org/chrarnold/diffTF>`_. We will respond in a timely manner.
Citation
============================
If you use this software, please cite the following reference:
Ivan Berest*, Christian Arnold*, Armando Reyes-Palomares, Giovanni Palla, Kasper Dindler Rassmussen, Kristian Helin & Judith B. Zaugg. *Quantification of differential transcription factor activity and multiomics-based classification into activators and repressors: diffTF*. 2018. in review.
We also put the paper on *bioRxiv*, please read all methodological details here:
`Quantification of differential transcription factor activity and multiomic-based classification into activators and repressors: diffTF <https://www.biorxiv.org/content/early/2018/12/01/368498>`_.
.. _changelog:
Change log
============================
SOON: Version 1.2 (2018-12-XX)
- The Snakemake / *diffTF* pipeline can now be combined with **Singularity**. Singularity is similar to Docker and provides a containerization approach. This has significant implications for users: Except for Snakemake and Singularity, no other tool, R or R package has to be installed prior to using *diffTF* anymore, which makes installing *diffTF* much easier and completely independent of the underlying operating system. We now provide two Singularity containers with all necessary tools and packages that are automatically integrated into the workflow. See the section :ref:`docs-singularityNotes` and :ref:`docs-quickstart` for more details. **Please note that for this to work reliably, Snakemake must be updated to at least version 5.3.1**.
Version 1.1.8 (2018-11-07)
- changed the call to the ``mlv`` function from the ``modeest`` package due to a breaking implementation change in version 2.3.2 that was published end of October 2018. *diffTF* now checks the package version for ``modeest`` and calls the functions in dependence of the specific version.
Version 1.1.7 (2018-10-25)
- the default value of the minimum number of data points for a CG bin to be included has been raised from 5 to 20 to make the variance calculation more reliable
- various small updates to the ``summaryFinal.R`` script
Version 1.1.6 (2018-10-11)
- fixed small issue in ``checkParameterValidity.R`` when not having sufficient permissions for the folder in which the fasta file is located
- updated the ``summaryFinal.R`` script. Now, for the Volcano plot PDF, in addition to adj. p-values, also the raw p-values are plotted in the end. This might be helpful for datasets with small signal when no adj. p-value is significant. In addition, labeling of TFs is now skipped when the number of TFs to label exceeds 150. THis makes the step faster and the PDF smaller and less crowded.
- small updates to the translation table for mm10
- adding two local rules to the Snakefile for potential minor speed improvements when running in cluster mode
Version 1.1.5 (2018-08-14)
- optimized ``checkParameterValidity.R`` script, only TFBS files for TFs included in the analysis are now checked
- addressed an R library compatibility issue independent of *diffTF* that users reported. In some cases, for particular versions of R and Bioconductor, R exited with a *segfault* (memory not mapped) error in the ``checkParameterValidity.R`` that seems to be caused by the combination of *DiffBind* and *DESeq2*. Specifically, when *DiffBind* is loaded *before* *DESeq2*, R crashes with a segmentation fault upon exiting, whereas loading *DiffBind* *after* *DESeq2* causes no issue. If there are further issues, please let us know. Thanks to Gyan Prakash Mishra, who first reported this.
- fixed an issue when the number of peaks is very small so that some TFs have no overlapping TFBS at all in the peak regions. This caused the rule ``intersectTFBSAndBAM`` to exit with an error due to grep's policy of returning exit code 1 if no matches are returned (thanks to Jonas Ungerbeck, again).
- removed the ``--timestamp`` option in the helper script ``startAnalysis.sh`` because this option has been removed for Snakemake >5.2.1
- Documentation updates
Version 1.1.4 (2018-08-09)
- minor, updated the ``checkParameterValidity.R`` script and the documentation (one package was not mentioned)
Version 1.1.3 (2018-08-06)
- minor, fixed a small issue in the Volcano plot (legends wrong and background color in the plot was not colored properly)
Version 1.1.2 (2018-08-03)
- fixed a bug that made the ``3.analyzeTF.R`` script fail in case when the number of permutations has been changed throughout the analysis or when the value is higher than the actual maximum number (thanks to Jonas Ungerbeck)
Version 1.1.1 (2018-08-01)
- Documentation updates (referenced the bioRxiv paper, extended the section about errors)
- updated the information on how to load the snakemake object into the R workspace in the corresponding R scripts
- fixed a bug that made the labels in the Volcano plot switch sides (thanks to Jonas Ungerbeck)
- merged some diagnostic plots for the AR classification in the last step
- renamed R scripts and R log files to make them consistent with the cluster output and error files
Version 1.1 (2018-07-27)
- added a new parameter ``dir_TFBS_sorted`` in the config file to specify that the TFBS input files are already sorted, which saves some computation time by not resorting them
- updated the TFBS files that are available via download (some files were not presorted correctly)
- added support for single-end BAM files. There is a new parameter ``pairedEnd`` in the config file that specifies whether reads are paired-end or not.
- restructured some of the permutation-related output files to save space and computation time. The rule ``concatenateMotifsPerm`` should now be much faster, and the TF-specific ``...outputPerm.tsv.gz`` files are now much smaller due to an improved column structure
Version 1.0.1 (2018-07-25)
- fixed a bug in ``2.DiffPeaks.R`` that sometimes caused the step to fail, thanks to Jonas Ungerbeck for letting us know
- fixed a bug in ``3.analyzeTF`` for rare corner cases when *DESeq* fails
Version 1.0 (2018-07-01)
- released stable version
License
============================
diffTF is licensed under the MIT License:
.. literalinclude:: ../LICENSE.md
:language: text
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment