Commit 0c4e2c5a authored by Christian Arnold's avatar Christian Arnold
Browse files

GRaNIE: New version, added TF enrichment

parent b09d24c1
Pipeline #28197 passed with stage
in 20 seconds
Package: GRaNIE
Title: GRaNIE: Reconstruction cell type specific gene regulatory networks including enhancers using chromatin accessibility and RNA-seq data
Version: 0.14.3
Version: 0.14.4
Encoding: UTF-8
Authors@R: c(person("Christian", "Arnold", email =
"christian.arnold@embl.de", role = c("cre","aut")),
......
......@@ -4280,6 +4280,13 @@ performAllNetworkAnalyses <- function(GRN, ontology = c("BP", "MF"),
background = "neighborhood",
clustering = "louvain",
communities = seq_len(10), display = "byRank",
TF_rankType = "degree",
TF_topNodes_n = 0.1,
TF_names = NULL,
TF_plot_topn_pvalue = 30,
TF_plot_p = 0.05,
TF_plot_nSignificant = 3,
TF_plot_nGO = 10,
outputFolder = NULL,
forceRerun = FALSE) {
......@@ -4303,6 +4310,17 @@ performAllNetworkAnalyses <- function(GRN, ontology = c("BP", "MF"),
GRN = plotCommunitiesEnrichment(GRN, outputFolder = outputFolder, display = display, communities = communities, forceRerun = forceRerun)
GRN = calculateTFEnrichment(GRN, rankType = TF_rankType, n = TF_topNodes_n, TF.names = TF_names,
ontology = ontology, algorithm = algorithm,
statistic = statistic, background = background,
forceRerun = forceRerun)
GRN = plotTFEnrichment(GRN, rankType = TF_rankType, n = TF_topNodes_n, TF.names = TF_names,
topn_pvalue = TF_plot_topn_pvalue, p = TF_plot_p,
nSignificant = TF_plot_nSignificant, nGO = TF_plot_nGO,
outputFolder = outputFolder, forceRerun = forceRerun)
.printExecutionTime(start)
......@@ -4687,7 +4705,7 @@ calculateCommunitiesEnrichment <- function(GRN,
communitiesDisplay = stats::na.omit(communitiesCount$community[communities])
}
futile.logger::flog.info(paste0("Running enrichment analysis for all communities. This may take a while..."))
futile.logger::flog.info(paste0("Running enrichment analysis for all ", length(communitiesDisplay)," communities. This may take a while..."))
mapping = .getGenomeObject(GRN@config$parameters$genomeAssembly, type = "packageName")
......
......@@ -2599,7 +2599,7 @@ plotTFEnrichment <- function(GRN, rankType = "degree", n = 0.1, TF.names = NULL,
}
# plot the comparative heatmap:
for (ontology in names(purrr::transpose(purrr::transpose(GRN@stats$Enrichment[["byTF"]])$results))){
for (ontologyCur in names(purrr::transpose(purrr::transpose(GRN@stats$Enrichment[["byTF"]])$results))){
enrichmentData = purrr::transpose(purrr::transpose(GRN@stats$Enrichment[["byTF"]])$results)[[ontologyCur]][TFset]
......
<!-- Generated by pkgdown: do not edit by hand -->
<!DOCTYPE html>
<!-- Generated by pkgdown: do not edit by hand --><html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Page not found (404) • GRaNIE</title>
<!-- jquery --><script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.4.1/jquery.min.js" integrity="sha256-CSXorXvZcTkaix6Yvo6HppcZGetbYMGWSFlBw8HfCJo=" crossorigin="anonymous"></script><!-- Bootstrap --><link href="https://cdnjs.cloudflare.com/ajax/libs/bootswatch/3.4.0/flatly/bootstrap.min.css" rel="stylesheet" crossorigin="anonymous">
<script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.4.1/js/bootstrap.min.js" integrity="sha256-nuL8/2cJ5NDSSwnKD8VqreErSWHtnEP9E7AySL+1ev4=" crossorigin="anonymous"></script><!-- bootstrap-toc --><link rel="stylesheet" href="bootstrap-toc.css">
<script src="bootstrap-toc.js"></script><!-- Font Awesome icons --><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/all.min.css" integrity="sha256-mmgLkCYLUQbXn0B1SRqzHar6dCnv9oZFPEC1g1cwlkk=" crossorigin="anonymous">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/v4-shims.min.css" integrity="sha256-wZjR52fzng1pJHwx4aV2AO3yyTOXrcDW7jBpJtTwVxw=" crossorigin="anonymous">
<!-- clipboard.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.6/clipboard.min.js" integrity="sha256-inc5kl9MA1hkeYUt+EC3BhlIgyp/2jDIyBLS6k3UxPI=" crossorigin="anonymous"></script><!-- headroom.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/headroom.min.js" integrity="sha256-AsUX4SJE1+yuDu5+mAVzJbuYNPHj/WroHuZ8Ir/CkE0=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/jQuery.headroom.min.js" integrity="sha256-ZX/yNShbjqsohH1k95liqY9Gd8uOiE1S4vZc+9KQ1K4=" crossorigin="anonymous"></script><!-- pkgdown --><link href="pkgdown.css" rel="stylesheet">
<script src="pkgdown.js"></script><meta property="og:title" content="Page not found (404)">
<!-- mathjax --><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js" integrity="sha256-nvJJv9wWKEm88qvoQl9ekL2J+k/RWIsaSScxxlsrv8k=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/config/TeX-AMS-MML_HTMLorMML.js" integrity="sha256-84DKXVJXs0/F8OTMzX4UR909+jtl4G7SPypPavF+GfA=" crossorigin="anonymous"></script><!--[if lt IE 9]>
<!-- jquery -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.4.1/jquery.min.js" integrity="sha256-CSXorXvZcTkaix6Yvo6HppcZGetbYMGWSFlBw8HfCJo=" crossorigin="anonymous"></script>
<!-- Bootstrap -->
<link href="https://cdnjs.cloudflare.com/ajax/libs/bootswatch/3.4.0/flatly/bootstrap.min.css" rel="stylesheet" crossorigin="anonymous" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.4.1/js/bootstrap.min.js" integrity="sha256-nuL8/2cJ5NDSSwnKD8VqreErSWHtnEP9E7AySL+1ev4=" crossorigin="anonymous"></script>
<!-- bootstrap-toc -->
<link rel="stylesheet" href="bootstrap-toc.css">
<script src="bootstrap-toc.js"></script>
<!-- Font Awesome icons -->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/all.min.css" integrity="sha256-mmgLkCYLUQbXn0B1SRqzHar6dCnv9oZFPEC1g1cwlkk=" crossorigin="anonymous" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/v4-shims.min.css" integrity="sha256-wZjR52fzng1pJHwx4aV2AO3yyTOXrcDW7jBpJtTwVxw=" crossorigin="anonymous" />
<!-- clipboard.js -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.6/clipboard.min.js" integrity="sha256-inc5kl9MA1hkeYUt+EC3BhlIgyp/2jDIyBLS6k3UxPI=" crossorigin="anonymous"></script>
<!-- headroom.js -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/headroom.min.js" integrity="sha256-AsUX4SJE1+yuDu5+mAVzJbuYNPHj/WroHuZ8Ir/CkE0=" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/jQuery.headroom.min.js" integrity="sha256-ZX/yNShbjqsohH1k95liqY9Gd8uOiE1S4vZc+9KQ1K4=" crossorigin="anonymous"></script>
<!-- pkgdown -->
<link href="pkgdown.css" rel="stylesheet">
<script src="pkgdown.js"></script>
<meta property="og:title" content="Page not found (404)" />
<!-- mathjax -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js" integrity="sha256-nvJJv9wWKEm88qvoQl9ekL2J+k/RWIsaSScxxlsrv8k=" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/config/TeX-AMS-MML_HTMLorMML.js" integrity="sha256-84DKXVJXs0/F8OTMzX4UR909+jtl4G7SPypPavF+GfA=" crossorigin="anonymous"></script>
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]--><!-- Global site tag (gtag.js) - Google Analytics --><script async src="https://www.googletagmanager.com/gtag/js?id=G-530L9SXFM1"></script><script>
<![endif]-->
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-530L9SXFM1"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-530L9SXFM1');
</script>
</head>
<body data-spy="scroll" data-target="#toc">
</head>
<body data-spy="scroll" data-target="#toc">
<div class="container template-title-body">
<header><div class="navbar navbar-default navbar-fixed-top" role="navigation">
<header>
<div class="navbar navbar-default navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false">
......@@ -38,13 +81,13 @@
</button>
<span class="navbar-brand">
<a class="navbar-link" href="index.html">GRaNIE</a>
<span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="">0.14.3</span>
<span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="Released version">0.14.4</span>
</span>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li>
<li>
<a href="index.html"></a>
</li>
<li>
......@@ -57,7 +100,7 @@
<span class="caret"></span>
</a>
<ul class="dropdown-menu" role="menu">
<li>
<li>
<a href="articles/quickStart.html">Getting Started</a>
</li>
<li>
......@@ -75,17 +118,19 @@
<a href="news/index.html">Changelog &amp; News</a>
</li>
</ul>
<ul class="nav navbar-nav navbar-right"></ul>
</div>
<!--/.nav-collapse -->
</div>
<!--/.container -->
</div>
<!--/.navbar -->
<ul class="nav navbar-nav navbar-right">
</ul>
</div><!--/.nav-collapse -->
</div><!--/.container -->
</div><!--/.navbar -->
</header><div class="row">
</header>
<div class="row">
<div class="contents col-md-9">
<div class="page-header">
<h1>Page not found (404)</h1>
......@@ -96,31 +141,31 @@ Content not found. Please use links in the navbar.
</div>
<div class="col-md-3 hidden-xs hidden-sm" id="pkgdown-sidebar">
<nav id="toc" data-toggle="toc" class="sticky-top"><h2 data-toc-skip>Contents</h2>
<nav id="toc" data-toggle="toc" class="sticky-top">
<h2 data-toc-skip>Contents</h2>
</nav>
</div>
</div>
</div>
<footer><div class="copyright">
<p></p>
<p>Developed by Christian Arnold, Judith Zaugg, Rim Moussa.</p>
<footer>
<div class="copyright">
<p>Developed by Christian Arnold, Judith Zaugg, Rim Moussa.</p>
</div>
<div class="pkgdown">
<p></p>
<p>Site built with <a href="https://pkgdown.r-lib.org/" class="external-link">pkgdown</a> 2.0.1.</p>
<p>Site built with <a href="https://pkgdown.r-lib.org/">pkgdown</a> 1.6.1.</p>
</div>
</footer>
</div>
</div>
</body>
</html>
......@@ -25,8 +25,6 @@
</script>
</head>
<body data-spy="scroll" data-target="#toc">
<div class="container template-article">
<header><div class="navbar navbar-default navbar-fixed-top" role="navigation">
<div class="container">
......@@ -39,7 +37,7 @@
</button>
<span class="navbar-brand">
<a class="navbar-link" href="../index.html">GRaNIE</a>
<span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="">0.14.3</span>
<span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="Released version">0.14.4</span>
</span>
</div>
......@@ -86,13 +84,13 @@
</header><div class="row">
</header><script src="Introduction_files/header-attrs-2.11/header-attrs.js"></script><div class="row">
<div class="col-md-9 contents">
<div class="page-header toc-ignore">
<h1 data-toc-skip>Introduction and Methodological Details</h1>
<h4 data-toc-skip class="author">Christian Arnold, Judith Zaugg</h4>
<h4 class="author">Christian Arnold, Judith Zaugg</h4>
<h4 data-toc-skip class="date">15 December 2021</h4>
<h4 class="date">18 December 2021</h4>
<div class="hidden name"><code>Introduction.Rmd</code></div>
......@@ -105,9 +103,9 @@
<p>This vignette introduces the <code>GRaNIE</code> package and explains the main features, methods and necessary background.</p>
</div>
<div class="section level1">
<h1 id="motivation">Motivation and Necessity<a class="anchor" aria-label="anchor" href="#motivation"></a>
</h1>
<div id="motivation" class="section level1">
<h1 class="hasAnchor">
<a href="#motivation" class="anchor"></a>Motivation and Necessity</h1>
<!-- <div align="center"> -->
<!-- <figure> -->
<!-- <img src="figs/Logo.png" height="200px"/> -->
......@@ -118,18 +116,18 @@
<p>Genetic variants associated with diseases often affect non-coding regions, thus likely having a regulatory role. To understand the effects of genetic variants in these regulatory regions, identifying genes that are modulated by specific regulatory elements (REs) is crucial. The effect of gene regulatory elements, such as enhancers, is often cell-type specific, likely because the combinations of transcription factors (TFs) that are regulating a given enhancer have celltype specific activity. This TF activity can be quantified with existing tools such as <em>diffTF</em> and captures differences in binding of a TF in open chromatin regions. Collectively, this forms a gene regulatory network (GRN) with cell-type and data-specific TF-RE and RE-gene links. Here, we reconstruct such a GRN using bulk RNAseq and open chromatin (e.g., using ATACseq or ChIPseq for open chromatin marks) and optionally TF activity data. Our network contains different types of links, connecting TFs to regulatory elements, the latter of which is connected to genes in the vicinity or within the same chromatin domain (TAD). We use a statistical framework to assign empirical FDRs and weights to all links using a permutation-based approach.</p>
<p>In summary, we present a framework to reconstruct predictive enhancer-mediated regulatory network models that are based on integrating of expression and chromatin accessibility/activity pattern across individuals, and provide a comprehensive resource of cell-type specific gene regulatory networks for particular cell types.</p>
</div>
<div class="section level1">
<h1 id="installation">Installation and Example Workflow<a class="anchor" aria-label="anchor" href="#installation"></a>
</h1>
<div id="installation" class="section level1">
<h1 class="hasAnchor">
<a href="#installation" class="anchor"></a>Installation and Example Workflow</h1>
<p>Please see the <a href="quickStart.html">quick start vignette for how to install our <code>GRaNIE</code> package(s)</a> and the <a href="workflow.html">workflow vignette for an example workflow</a>.</p>
</div>
<div class="section level1">
<h1 id="input">Input<a class="anchor" aria-label="anchor" href="#input"></a>
</h1>
<div id="input" class="section level1">
<h1 class="hasAnchor">
<a href="#input" class="anchor"></a>Input</h1>
<p>In our <code>GRN</code> approach, we integrate multiple data modalities. Here, we describe them in detail and their required format.</p>
<div class="section level2">
<h2 id="input_peaks">Open chromatin and RNA-seq data<a class="anchor" aria-label="anchor" href="#input_peaks"></a>
</h2>
<div id="input_peaks" class="section level2">
<h2 class="hasAnchor">
<a href="#input_peaks" class="anchor"></a>Open chromatin and RNA-seq data</h2>
<p>Open chromatin data may come from ATAC-seq, DNAse-seq or ChIP-seq data for particular histone modifications that associate with open chromatin such as histone acetylation (e.g., H3K27ac). They all capture open chromatin either directly or indirectly, and while we primarily tested and used ATAC-seq while developing the package, the others should also be applicable for our framework. <em>From here on, we will refer to these regions simply as peaks.</em></p>
<p>For RNA-seq, the data represent expression counts per gene across samples.</p>
<p>Here is a quick graphical representation which format is required to be compatible with our framework:</p>
......@@ -141,7 +139,7 @@
<ul>
<li>The name of the ID column can be anything and can be specific later in the pipeline. For peaks, we usually use <code>peakID</code> while for RNA-seq, we use <code>EnsemblID</code>
</li>
<li>for peaks, the required format is “chr:start-end”, with <code>chr</code> denoting the chromosome, followed by <code>:</code>, and then <code>start</code>, <code>-</code>, and <code>end</code> for the peak start and end, respectively.</li>
<li>for peaks, the required format is “chr:start-end”, with <code>chr</code> denoting the chromosome, followed by <code><a href="https://rdrr.io/r/base/Colon.html">:</a></code>, and then <code>start</code>, <code><a href="https://rdrr.io/r/base/Arithmetic.html">-</a></code>, and <code>end</code> for the peak start and end, respectively.</li>
</ul>
</li>
<li>counts should be raw if possible (that is, integers), but we also support pre-normalized data. <a href="#methods_dataNorm">See here for more information.</a>
......@@ -151,9 +149,9 @@
<p>Note that peaks must not overlap. If they do, an informative error message is thrown and the user is requested to modify the peak input data so that no overlaps exist among all peaks. This can be done by either merging overlapping peaks or deleting those that overlap with other peaks based on other criteria such as peak signal, by keeping only the strongest peak, for example.</p>
<p>For guidelines on how many peaks are necessary or recommended, see <a href="#guidelines">the section below</a>.</p>
</div>
<div class="section level2">
<h2 id="input_TF">TF and TFBS data<a class="anchor" aria-label="anchor" href="#input_TF"></a>
</h2>
<div id="input_TF" class="section level2">
<h2 class="hasAnchor">
<a href="#input_TF" class="anchor"></a>TF and TFBS data</h2>
<p>TF and TFBS data is mandatory as input. Specifically, the package requires a <code>bed</code> file per TF with TF binding sites (TFBS). TFBS can either be in-silico predicted, or experimentally verified, as long as genome-wide TFBS can be used. For convenience and orientation, we provide TFBS predictions for HOCOMOCO-based TF motifs that were used with <code>PWMScan</code> for <code>hg19</code>, <code>hg38</code> and <code>mm10</code>. Check the <a href="workflow.html">workflow vignette for an example</a>.</p>
<p>However, you may also use your own TFBS data, and we provide full flexibility in doing so. Only some manual preparation is necessary. Briefly, if you decide to use your own TFBS data, you have to prepare the following:</p>
<ul>
......@@ -163,56 +161,56 @@
</ul>
<p>For more methodological details, details on how to construct these files, their exact format etc we refer to <code>diffTF</code> paper for details.</p>
</div>
<div class="section level2">
<h2 id="input_metadata">Sample metadata (optional but highly recommended)<a class="anchor" aria-label="anchor" href="#input_metadata"></a>
</h2>
<div id="input_metadata" class="section level2">
<h2 class="hasAnchor">
<a href="#input_metadata" class="anchor"></a>Sample metadata (optional but highly recommended)</h2>
<p>Providing sample metadata is optional, but highly recommended - if available, the sample metadata is integrated into the PCA plots to understand where the variation in the data comes from and whether any of the metadata (e.g., age, sex, sequencing batch) is associated with the PCs from a PC, indicating a batch effect that needs to be addressed before running the <code>GRaNIE</code> pipeline.</p>
<p>The integration of sample metadata is in the <code>addData</code> function, see <code><a href="../reference/addData.html">?addData</a></code> for more information.</p>
</div>
<div class="section level2">
<h2 id="input_HiC">Hi-C data (optional)<a class="anchor" aria-label="anchor" href="#input_HiC"></a>
</h2>
<div id="input_HiC" class="section level2">
<h2 class="hasAnchor">
<a href="#input_HiC" class="anchor"></a>Hi-C data (optional)</h2>
<p>Integration of Hi-C data is optional and serves as alternative to identifying peak-gene pairs to test for correlation based on a predefined and fixed <em>neighborhood</em> size (see <a href="#methods_peakGene">Methods</a>).</p>
<p>If Hi-C data are available, the pipeline expects a BED file format with at least 3 columns: chromosome name, start, and end. An ID column is optional and assumed to be in the 4th column, all additional columns are ignored.</p>
<p>For more details, see the R help (<code><a href="../reference/addConnections_peak_gene.html">?addConnections_peak_gene</a></code>) and the <a href="#methods_peakGene">Methods</a>.</p>
</div>
<div class="section level2">
<h2 id="input_SNP">SNP data (optional, coming soon)<a class="anchor" aria-label="anchor" href="#input_SNP"></a>
</h2>
<div id="input_SNP" class="section level2">
<h2 class="hasAnchor">
<a href="#input_SNP" class="anchor"></a>SNP data (optional, coming soon)</h2>
<p>We also plan to integrate SNP data soon, stay tuned!</p>
</div>
</div>
<div class="section level1">
<h1 id="methods">Methodological Details and Basic Mode of Action<a class="anchor" aria-label="anchor" href="#methods"></a>
</h1>
<div id="methods" class="section level1">
<h1 class="hasAnchor">
<a href="#methods" class="anchor"></a>Methodological Details and Basic Mode of Action</h1>
<p>In this section, we give methodological details and guidelines.</p>
<div class="section level2">
<h2 id="methods_dataNorm">Data normalization<a class="anchor" aria-label="anchor" href="#methods_dataNorm"></a>
</h2>
<div id="methods_dataNorm" class="section level2">
<h2 class="hasAnchor">
<a href="#methods_dataNorm" class="anchor"></a>Data normalization</h2>
<p>An important consideration is data normalization for RNA and open chromatin data. We currently support three choices of normalization of either peak or RNA-Seq data: <code>quantile</code>, <code>DESeq_sizeFactor</code> and <code>none</code> and refer to the R help for more details (<code><a href="../reference/addData.html">?addData</a></code>). The default for RNA-Seq is a quantile normalization, while for the open chromatin peak data, it is <code>DESeq_sizeFactor</code> (i.e., a “regular” <code>DESeq</code> size factor normalization). Importantly, <code>DESeq_sizeFactor</code> requires raw data, while <code>quantile</code> does not necessarily. We nevertheless recommend raw data as input, although it is also possible to provide pre-normalized data as input and then topping this up with another normalization method or “none”.</p>
<p>While we recommend raw counts for both peaks and RNA-Seq as input and offer several normalization choices in the pipeline, it is also possible to provide pre-normalized data. Note that the normalization method may have a large influence on the resulting <code>eGRN</code> network, so make sure the choice of normalization is reasonable. For more details, see the next sections.</p>
</div>
<div class="section level2">
<h2 id="methods_permutedData">Permutations<a class="anchor" aria-label="anchor" href="#methods_permutedData"></a>
</h2>
<div id="methods_permutedData" class="section level2">
<h2 class="hasAnchor">
<a href="#methods_permutedData" class="anchor"></a>Permutations</h2>
<p>RNA-Seq is shuffled, this is permutation 1. TODO: More</p>
</div>
<div class="section level2">
<h2 id="methods_TF_peak">TF-peak connections<a class="anchor" aria-label="anchor" href="#methods_TF_peak"></a>
</h2>
<div class="section level3">
<h3 id="methods_TF_peak_build">Establishing TF-peak links<a class="anchor" aria-label="anchor" href="#methods_TF_peak_build"></a>
</h3>
<div id="methods_TF_peak" class="section level2">
<h2 class="hasAnchor">
<a href="#methods_TF_peak" class="anchor"></a>TF-peak connections</h2>
<div id="methods_TF_peak_build" class="section level3">
<h3 class="hasAnchor">
<a href="#methods_TF_peak_build" class="anchor"></a>Establishing TF-peak links</h3>
<p>TODO: Describe hoe we establish TF-peak links</p>
</div>
<div class="section level3">
<h3 id="methods_TF_peak_TFActivity">TF Activity connections<a class="anchor" aria-label="anchor" href="#methods_TF_peak_TFActivity"></a>
</h3>
<div id="methods_TF_peak_TFActivity" class="section level3">
<h3 class="hasAnchor">
<a href="#methods_TF_peak_TFActivity" class="anchor"></a>TF Activity connections</h3>
<p>As explained above, TF-peak connections are found by correlation TF <em>expression</em> with peak accessibility. In addition to <em>expression</em>, we also offer to identify statistically significant TF-peak links based on <em>TF Activity</em> and not expression of the TFs. The concept of TF Activity is described in more detail in our <em>diffTF</em> paper. In short, we define TF motif activity, or TF activity for short, as the effect of a TF on the state of chromatin as measured by chromatin accessibility or active chromatin marks (i.e., ATAC-seq, DNase sequencing [DNase-seq], or histone H3 lysine 27 acetylation [H3K27ac] ChIP-seq). A <em>TF Activity</em> score is therefore needed <em>for each TF and each sample</em>.</p>
<p>TF Activity information can either be calculated within the <code>GRaNIE</code> framework <a href="#methods_TF_peak_TFActivity_calculating">using a simplified and empirical approach)</a> or it can be calculated outside of our framework using designated methods and then <a href="#methods_TF_peak_TFActivity_importing">imported into our framework</a>. We now describe these two choices in more detail.</p>
<div class="section level4">
<h4 id="methods_TF_peak_TFActivity_calculating">Calculating TF Activity<a class="anchor" aria-label="anchor" href="#methods_TF_peak_TFActivity_calculating"></a>
</h4>
<div id="methods_TF_peak_TFActivity_calculating" class="section level4">
<h4 class="hasAnchor">
<a href="#methods_TF_peak_TFActivity_calculating" class="anchor"></a>Calculating TF Activity</h4>
<p>In our <em>GRaNIE</em> approach, we empirically estimate TF Activity for each TF with the following approach:</p>
<ul>
<li>normalize the raw peak counts by one of the supported normalization methods (see below)</li>
......@@ -229,53 +227,53 @@
<li>No normalization</li>
</ol>
</div>
<div class="section level4">
<h4 id="methods_TF_peak_TFActivity_importing">Importing TF Activity<a class="anchor" aria-label="anchor" href="#methods_TF_peak_TFActivity_importing"></a>
</h4>
<div id="methods_TF_peak_TFActivity_importing" class="section level4">
<h4 class="hasAnchor">
<a href="#methods_TF_peak_TFActivity_importing" class="anchor"></a>Importing TF Activity</h4>
<p>Soon, it will also be possible to import TF Activity data into our framework as opposed to calculating it using the procedure as described above. This feature is currently in development and will be available soon.</p>
</div>
<div class="section level4">
<h4 id="methods_TF_peak_TFActivity_adding">Adding TF Activity TF-peak connections<a class="anchor" aria-label="anchor" href="#methods_TF_peak_TFActivity_adding"></a>
</h4>
<div id="methods_TF_peak_TFActivity_adding" class="section level4">
<h4 class="hasAnchor">
<a href="#methods_TF_peak_TFActivity_adding" class="anchor"></a>Adding TF Activity TF-peak connections</h4>
<p>Once TF Activity data is available, finding TF-peak links and assessing their significance is then done in complete analogy as for the TF expression data - just the input data is different (TF Activity as opposed to TF expression). The so-called connection type - <em>expression</em> or <em>TF Activity</em>, is stored in the <em>GRN</em> object and output tables and therefore allows to tailor and filter the resulting network accordingly. All output PDFs also contain the information whether a TF-peak link has been established via the <em>TF expression</em> or <em>TF Activity</em>.</p>
</div>
</div>
</div>
<div class="section level2">
<h2 id="methods_peakGene">Peak-gene associations<a class="anchor" aria-label="anchor" href="#methods_peakGene"></a>
</h2>
<div id="methods_peakGene" class="section level2">
<h2 class="hasAnchor">
<a href="#methods_peakGene" class="anchor"></a>Peak-gene associations</h2>
<p>We offer two options of where in the gene the overlap with the extended peak may occur: at the 5’ end of the gene (the default) or anywhere in the gene. For more information see the R help (<code><a href="../reference/addConnections_peak_gene.html">?addConnections_peak_gene</a></code> and the parameter <code>overlapTypeGene</code> in particular)</p>
<div class="section level3">
<h3 id="two-approaches-for-identifying-peak-gene-pairs-to-test-for-correlation">Two approaches for identifying peak-gene pairs to test for correlation<a class="anchor" aria-label="anchor" href="#two-approaches-for-identifying-peak-gene-pairs-to-test-for-correlation"></a>
</h3>
<div id="two-approaches-for-identifying-peak-gene-pairs-to-test-for-correlation" class="section level3">
<h3 class="hasAnchor">
<a href="#two-approaches-for-identifying-peak-gene-pairs-to-test-for-correlation" class="anchor"></a>Two approaches for identifying peak-gene pairs to test for correlation</h3>
<p>We offer two options to decide which peak-gene pairs to test for correlation: in absence of additional topologically associating domain (TADs) data from Hi-C or similar approaches, the pipeline used a local neighborhood-based approach with a custom neighborhood size (default: 250 kb up- and downstream of the peak) to select peak-gene pairs to test. In the presence of TAD data, all peak-gene pairs within a TAD are tested, while peaks located outside of any TAD domain are ignored. The user has furthermore the choice to specify whether overlapping TADs should be merged or not.</p>
</div>
</div>
</div>
<div class="section level1">
<h1 id="guidelines">Guidelines, Recommendations, Limitations, Scope<a class="anchor" aria-label="anchor" href="#guidelines"></a>
</h1>
<div id="guidelines" class="section level1">
<h1 class="hasAnchor">
<a href="#guidelines" class="anchor"></a>Guidelines, Recommendations, Limitations, Scope</h1>
<p>In this section, we provide a few guidelines and recommendations that may be helpful for your analysis.</p>
<div class="section level2">
<h2 id="guidelines_scope">Package scope<a class="anchor" aria-label="anchor" href="#guidelines_scope"></a>
</h2>
<div id="guidelines_scope" class="section level2">
<h2 class="hasAnchor">
<a href="#guidelines_scope" class="anchor"></a>Package scope</h2>
<p>In this section, we want explicitly mention the designated scope of the <code>GRaNIE</code> package, its limitations and additional / companion packages that may be used subsequently or beforehand.</p>
<p>Coming soon.</p>
</div>
<div class="section level2">
<h2 id="guidelines_TFBS">Transcription factor binding sites (TFBS)<a class="anchor" aria-label="anchor" href="#guidelines_TFBS"></a>
</h2>
<div id="guidelines_TFBS" class="section level2">
<h2 class="hasAnchor">
<a href="#guidelines_TFBS" class="anchor"></a>Transcription factor binding sites (TFBS)</h2>
<p>TFBS are a crucial input for any <code>GRaNIE</code> analysis. Our <code>GRaNIE</code> approach is very agnostic as to how these files are generated - as long as one BED file per TF is provided with TFBS positions, the TF can be integrated.As explained above, we usually work with TFBS as predicted by <code>PWMScan</code> based on <code>HOCOMOCO</code> TF motifs, while in-silico predicted TFBS are by no means a requirement of the pipeline. Instead, <code>JASPAR</code> TFBS or TFBS from any other database can also be used. The total number of TF and TFBS per TF seems more relevant here, due to the way we integrate TFBS: We create a binary 0/1 overlap matrix for each peak and TF, with 0 indicating that no TFBS for a particular TF overlaps with a particular peak, while 1 indicates that at least 1 TFBS from the TFBS input data does indeed overlap with the particular peak by at least 1 bp. Thus, having more TFBS in general also increases the number of 1s and therefore the <em>foreground</em> of the TF (see the diagnostic plots) while it makes the foreground also more noisy if the TFBS list contains too many false positives. As always in biology, this is a trade-off.</p>
</div>
<div class="section level2">
<h2 id="guidelines_peaks">Peaks<a class="anchor" aria-label="anchor" href="#guidelines_peaks"></a>
</h2>
<div id="guidelines_peaks" class="section level2">
<h2 class="hasAnchor">
<a href="#guidelines_peaks" class="anchor"></a>Peaks</h2>
<p>The number of peaks that is provided as input matters greatly for the resulting GRN and its connectivity. From our experience, this number should be in a reasonable range so that there is enough data to build a GRN, but also not so many that the whole pipeline runs unnecessarily long. We have good experience with the number of peaks ranging between 50,000 and 200,000 or so, although these are not hard thresholds but rather recommendations.</p>
<p>With respect to the recommended width of the peaks, we usually use peaks that have a width of a couple of hundred base pairs until a few kb, while the default is to filter peaks if they are wider than 10,000 bp (parameter <code>maxSize_peaks</code> in the function <code>filterData</code>). Remember: peaks are used to overlap them with TFBS, so if a particular peak is too narrow, the likelihood of not overlapping with any (predicted) TFBS from any TF increases, and such a peak is subsequently essentially ignored.</p>
</div>
<div class="section level2">
<h2 id="guidelines_RNA">RNA-Seq<a class="anchor" aria-label="anchor" href="#guidelines_RNA"></a>
</h2>
<div id="guidelines_RNA" class="section level2">
<h2 class="hasAnchor">
<a href="#guidelines_RNA" class="anchor"></a>RNA-Seq</h2>
<p>The following list is subject to change and provides some rough guidelines for the RNA-Seq data:</p>
<ol style="list-style-type: decimal">
<li>We recommend using raw counts if possible, and checking carefully in a PCA whether any batch effects are visible.</li>
......@@ -283,27 +281,27 @@
<li>At the moment, we did not properly test our framework for single-cell RNA-Seq data, and therefore cannot provide support for it. Thus, use regular bulk data until we advanced with the single-cell applicability.</li>
</ol>
</div>
<div class="section level2">
<h2 id="guidelines_peakGene">Peak-gene p-values accuracy and violations<a class="anchor" aria-label="anchor" href="#guidelines_peakGene"></a>
</h2>
<div id="guidelines_peakGene" class="section level2">
<h2 class="hasAnchor">
<a href="#guidelines_peakGene" class="anchor"></a>Peak-gene p-values accuracy and violations</h2>
<p>Coming soon!</p>
</div>
</div>
<div class="section level1">
<h1 id="output">Output<a class="anchor" aria-label="anchor" href="#output"></a>
</h1>
<div id="output" class="section level1">
<h1 class="hasAnchor">
<a href="#output" class="anchor"></a>Output</h1>
<p>Here, we describe the various output files that are produced by the pipeline. They are described in the order they are produced in the pipeline.</p>
<div class="section level2">
<h2 id="output_GRN">GRN object<a class="anchor" aria-label="anchor" href="#output_GRN"></a>
</h2>
<div id="output_GRN" class="section level2">
<h2 class="hasAnchor">
<a href="#output_GRN" class="anchor"></a>GRN object</h2>
<p><strong>Our pipeline works and output a so-called <code>GRN</code> object. The goal is simple: All information is stored in it, and by keeping everything within one object and sharing it with others, they have all the necessary data and information to run the <code>GRN</code> workflow. A consistent and simple workflow logic makes it easy and intuitive to work with it, similar to other packages such as <code>DESeq2</code>.</strong></p>
<p>Technically speaking, it is an S4 object of class <code>GRN</code>. As you can see from the <a href="workflow.html">workflow vignette</a>, almost all <code>GRaNIE</code> functions return a <code>GRN</code> object (with the notable exception of <code>get</code> functions). All <code>GRaNIE</code> functions (except <code>initializeGRN</code>, which creates an empty <code>GRN</code> object) also require a <code>GRN</code> object as first argument, which makes is easy and intuitive to work with the package, at least this was the goal we had in mind. We would be happy to receive your feedback about it!</p>
<p><code>GRN</code> objects contain all data and results necessary for the various functions the package provides, and various extractor functions allow to extract information out of an <code>GRN</code> object such as the various <code>get</code> functions. In addition, printing a <code>GRN</code> object results in an object summary that is printed (try it out and just type <code>GRN</code> in the console if your <code>GRN</code> object is called like this!). In the future, we aim to add more convenience functions. If you have specific ideas, please let us know.</p>
<p>The slots of a <code>GRN</code> object are described in the R help, see <code>?GRN</code> for details. While we work on general extractor functions for the various slots for optimal user experience, we currently suggest to also access and explore the data directly with the <code>@</code> operator. For example, <code>GRN@config</code> accesses the configuration slot that contains all parameters and object metadata, and <code>slotNames(GRN)</code> prints all available slots of the object.</p>
<p>The slots of a <code>GRN</code> object are described in the R help, see <code>?GRN</code> for details. While we work on general extractor functions for the various slots for optimal user experience, we currently suggest to also access and explore the data directly with the <code><a href="https://rdrr.io/r/base/slotOp.html">@</a></code> operator. For example, <code>GRN@config</code> accesses the configuration slot that contains all parameters and object metadata, and <code>slotNames(GRN)</code> prints all available slots of the object.</p>
</div>
<div class="section level2">
<h2 id="output_PCA">PCA plots and results<a class="anchor" aria-label="anchor" href="#output_PCA"></a>
</h2>
<div id="output_PCA" class="section level2">
<h2 class="hasAnchor">
<a href="#output_PCA" class="anchor"></a>PCA plots and results</h2>
<p>The pipeline outputs PCA plots for both peaks and RNA as well as original (i..e, the counts the user provided as input) and normalized (i.e., the counts after normalizing them if any normalization method has been provided) data. Thus, in total, 4 different PCA plots are produced, 2 per data modality (peaks and RNA) and 2 per data type (original and normalized counts).</p>
<p>Each PDF consists of three parts: PCA results based on the top 500, top 1000 and top 5000 features (see page headers). For each part, different plot types are available and briefly explained in the following:</p>
<ol style="list-style-type: decimal">
......@@ -340,9 +338,9 @@
</div>
<p>Currently, the actual PCA result data are not stored in the <code>GRN</code> object, but this will be available soon as well. We will update the Vignette once this is done and mention it in the Changelog.</p>
</div>
<div class="section level2">
<h2 id="output_TF_peak">TF-peak diagnostic plots<a class="anchor" aria-label="anchor" href="#output_TF_peak"></a>
</h2>
<div id="output_TF_peak" class="section level2">
<h2 class="hasAnchor">
<a href="#output_TF_peak" class="anchor"></a>TF-peak diagnostic plots</h2>
<p>TF-peak diagnostic plots are available for each TF, and they currently look as follows:</p>
<div class="figure">
<img src="figs/TFPeak_fdr_orig/p-25.png" alt="&lt;i&gt;TF-peak diagnostic plots for an example TF&lt;/i&gt;" width="100%"><p class="caption">
......@@ -351,15 +349,15 @@
</div>
<p>The TF name is indicated in the title, and each page shows two plots. In each plot, the TF-peak FDR for each correlation bin (ranging from -1 to 1 in bins of size 0.05) is shown. The only difference between the two plots is the directionality upon which the FDR is empirically derived from: the upper plot is for the <em>positive</em> and the lower plot for the <em>negative</em> direction. Each plot is also colored by the number of distinct TF-peak connections that fall into the particular bin. Mostly, correlation bins with smaller absolute correlation values have higher frequencies (i.e., more TF-peak links fall into them) while correlation bins with more extreme correlation values are less frequent. In the end, for the resulting network, the directionality can be ignored and only those TF-peak links are kept with small FDRs, irrespective of the directionality.</p>
</div>
<div class="section level2">
<h2 id="output_AR">Activator-repressor classification diagnostic plots and results<a class="anchor" aria-label="anchor" href="#output_AR"></a>