workflow.html 157 KB
Newer Older
1
2
3
4
5
6
7
<!DOCTYPE html>
<!-- Generated by pkgdown: do not edit by hand --><html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
Christian Arnold's avatar
Christian Arnold committed
8
9
10
11
12
13
14
<title>Workflow example • GRaNIE</title>
<!-- favicons --><link rel="icon" type="image/png" sizes="16x16" href="../favicon-16x16.png">
<link rel="icon" type="image/png" sizes="32x32" href="../favicon-32x32.png">
<link rel="apple-touch-icon" type="image/png" sizes="180x180" href="../apple-touch-icon.png">
<link rel="apple-touch-icon" type="image/png" sizes="120x120" href="../apple-touch-icon-120x120.png">
<link rel="apple-touch-icon" type="image/png" sizes="76x76" href="../apple-touch-icon-76x76.png">
<link rel="apple-touch-icon" type="image/png" sizes="60x60" href="../apple-touch-icon-60x60.png">
15
16
17
18
19
20
<!-- jquery --><script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.4.1/jquery.min.js" integrity="sha256-CSXorXvZcTkaix6Yvo6HppcZGetbYMGWSFlBw8HfCJo=" crossorigin="anonymous"></script><!-- Bootstrap --><link href="https://cdnjs.cloudflare.com/ajax/libs/bootswatch/3.4.0/flatly/bootstrap.min.css" rel="stylesheet" crossorigin="anonymous">
<script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.4.1/js/bootstrap.min.js" integrity="sha256-nuL8/2cJ5NDSSwnKD8VqreErSWHtnEP9E7AySL+1ev4=" crossorigin="anonymous"></script><!-- bootstrap-toc --><link rel="stylesheet" href="../bootstrap-toc.css">
<script src="../bootstrap-toc.js"></script><!-- Font Awesome icons --><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/all.min.css" integrity="sha256-mmgLkCYLUQbXn0B1SRqzHar6dCnv9oZFPEC1g1cwlkk=" crossorigin="anonymous">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/v4-shims.min.css" integrity="sha256-wZjR52fzng1pJHwx4aV2AO3yyTOXrcDW7jBpJtTwVxw=" crossorigin="anonymous">
<!-- clipboard.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.6/clipboard.min.js" integrity="sha256-inc5kl9MA1hkeYUt+EC3BhlIgyp/2jDIyBLS6k3UxPI=" crossorigin="anonymous"></script><!-- headroom.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/headroom.min.js" integrity="sha256-AsUX4SJE1+yuDu5+mAVzJbuYNPHj/WroHuZ8Ir/CkE0=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/jQuery.headroom.min.js" integrity="sha256-ZX/yNShbjqsohH1k95liqY9Gd8uOiE1S4vZc+9KQ1K4=" crossorigin="anonymous"></script><!-- pkgdown --><link href="../pkgdown.css" rel="stylesheet">
<script src="../pkgdown.js"></script><meta property="og:title" content="Workflow example">
Christian Arnold's avatar
Christian Arnold committed
21
<meta property="og:description" content="GRaNIE">
22
23
24
25
26
27
28
29
30
31
32
33
<!-- mathjax --><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js" integrity="sha256-nvJJv9wWKEm88qvoQl9ekL2J+k/RWIsaSScxxlsrv8k=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/config/TeX-AMS-MML_HTMLorMML.js" integrity="sha256-84DKXVJXs0/F8OTMzX4UR909+jtl4G7SPypPavF+GfA=" crossorigin="anonymous"></script><!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]--><!-- Global site tag (gtag.js) - Google Analytics --><script async src="https://www.googletagmanager.com/gtag/js?id=G-530L9SXFM1"></script><script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', 'G-530L9SXFM1');
</script>
</head>
<body data-spy="scroll" data-target="#toc">
34
35
    

36
37
38
39
40
41
42
43
44
45
46
    <div class="container template-article">
      <header><div class="navbar navbar-default navbar-fixed-top" role="navigation">
  <div class="container">
    <div class="navbar-header">
      <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false">
        <span class="sr-only">Toggle navigation</span>
        <span class="icon-bar"></span>
        <span class="icon-bar"></span>
        <span class="icon-bar"></span>
      </button>
      <span class="navbar-brand">
Christian Arnold's avatar
Christian Arnold committed
47
48
        <a class="navbar-link" href="../index.html">GRaNIE</a>
        <span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="">0.14.5</span>
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
      </span>
    </div>

    <div id="navbar" class="navbar-collapse collapse">
      <ul class="nav navbar-nav">
<li>
  <a href="../index.html"></a>
</li>
<li>
  <a href="../articles/quickStart.html">Getting Started</a>
</li>
<li class="dropdown">
  <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">
    Vignettes
     
    <span class="caret"></span>
  </a>
  <ul class="dropdown-menu" role="menu">
<li>
      <a href="../articles/quickStart.html">Getting Started</a>
    </li>
    <li>
71
      <a href="../articles/packageDetails.html">Package Details</a>
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
    </li>
    <li>
      <a href="../articles/workflow.html">Workflow example</a>
    </li>
  </ul>
</li>
<li>
  <a href="../reference/index.html">Reference</a>
</li>
<li>
  <a href="../news/index.html">Changelog &amp; News</a>
</li>
      </ul>
<ul class="nav navbar-nav navbar-right"></ul>
</div>
<!--/.nav-collapse -->
  </div>
<!--/.container -->
</div>
<!--/.navbar -->

      

95
      </header><div class="row">
96
97
98
  <div class="col-md-9 contents">
    <div class="page-header toc-ignore">
      <h1 data-toc-skip>Workflow example</h1>
99
                        <h4 data-toc-skip class="author">Christian Arnold, Judith Zaugg, Rim Moussa</h4>
100
            
Christian Arnold's avatar
Christian Arnold committed
101
            <h4 data-toc-skip class="date">21 January 2022</h4>
102
103
104
105
106
107
108
109
110
      
      
      <div class="hidden name"><code>workflow.Rmd</code></div>

    </div>

    
        <div class="abstract">
      <p class="abstract">Abstract</p>
111
      <p>This workflow vignette shows how to use the <code>GRaNIE</code> package in a real-world example. For this purpose, you will use the <code>GRaNIEData</code> package for a more complex analysis to illustrate most of its features. Importantly, you will also learn in detail how to work with a <code>GRaNIE</code> object and what its main functions and properties are. The vignette will be continuously updated whenever new functionality becomes available or when we receive user feedback.</p>
112
113
    </div>
    
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
<style type="text/css">
pre {
  max-height: 300px;
  overflow-y: auto;
}
pre[class] {
  max-height: 100px;
}
</style>
<style type="text/css">
.scroll-100 {
  max-height: 100px;
  overflow-y: auto;
  background-color: inherit;
}
</style>
<style type="text/css">
.scroll-200 {
  max-height: 200px;
  overflow-y: auto;
  background-color: inherit;
}
</style>
<style type="text/css">
.scroll-300 {
  max-height: 300px;
  overflow-y: auto;
  background-color: inherit;
}
</style>
<div class="section level2">
<h2 id="example-workflow">Example Workflow<a class="anchor" aria-label="anchor" href="#example-workflow"></a>
</h2>
147
<p><a name="section1"></a></p>
148
149
150
<p>In the following example, you will use data from the <code>GRaNIEData</code> package to construct a eGRN from ATAC-seq, RNA-seq data as well transcription factor data.</p>
<p>First, let’s load the required libraries <code>GRaNIE</code> and <code>GRaNIEData</code>. The <code>tidyverse</code> package is already loaded and attached when loading the <code>GRaNIE</code> package, but we nevertheless load it here explicitly to highlight that we’ll use various <code>tidyverse</code> functions for data import.</p>
<p>For reasons of brevity, we omit the output of this code chunk.</p>
151
<div class="sourceCode" id="cb1"><pre class="downlit sourceCode r">
152
153
<code class="sourceCode R"><span class="kw"><a href="https://rdrr.io/r/base/library.html" class="external-link">library</a></span><span class="op">(</span><span class="va"><a href="https://tidyverse.tidyverse.org" class="external-link">tidyverse</a></span><span class="op">)</span>
<span class="kw"><a href="https://rdrr.io/r/base/library.html" class="external-link">library</a></span><span class="op">(</span><span class="va">GRaNIEData</span><span class="op">)</span>
Christian Arnold's avatar
Christian Arnold committed
154
<span class="kw"><a href="https://rdrr.io/r/base/library.html" class="external-link">library</a></span><span class="op">(</span><span class="va"><a href="https://grp-zaugg.embl-community.io/GRaNIE" class="external-link">GRaNIE</a></span><span class="op">)</span></code></pre></div>
155
156
157
158
<div class="section level3">
<h3 id="general-notes">General notes<a class="anchor" aria-label="anchor" href="#general-notes"></a>
</h3>
<p>Each of the <code>GRaNIE</code> functions we mention here in this Vignette comes with sensible default parameters that we found to work well for most of the datasets we tested it with so far. However, <strong>always check the validity and usefulness of the parameters before starting an analysis</strong> to avoid unreasonable results.</p>
159
</div>
160
161
162
<div class="section level3">
<h3 id="reading-the-data-required-for-the-granie-package">Reading the data required for the <em>GRaNIE</em> package<a class="anchor" aria-label="anchor" href="#reading-the-data-required-for-the-granie-package"></a>
</h3>
Christian Arnold's avatar
Christian Arnold committed
163
<p>To set up a <em>GRaNIE</em> analysis, we first need to read in some data into <em>R</em>. The following data can be used for the <em>GRaNIE</em> package:</p>
164
165
166
167
168
169
170
171
172
173
174
<ul>
<li>open chromatin / peak data (from either ATAC-Seq, DNAse-Seq or ChIP-Seq data, for example), hereafter simply referred to as <em>peaks</em>
</li>
<li>RNA-Seq data (gene expression counts for genes across samples)</li>
</ul>
<p>The following data can be used optionally but are not required:</p>
<ul>
<li>sample metadata (e.g., sex, gender, age, sequencing batch, etc)</li>
<li>TAD domains (bed file)</li>
</ul>
<p>So, let’s import the peak and RNA-seq data as a data frame as well as some sample metadata. This can be done in any way you want as long as you end up with the right format.</p>
175
176
<div class="sourceCode" id="cb2"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span class="op">(</span><span class="va">files</span> <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/list.files.html" class="external-link">list.files</a></span><span class="op">(</span>pattern <span class="op">=</span> <span class="st">"*"</span>, <span class="fu"><a href="https://rdrr.io/r/base/system.file.html" class="external-link">system.file</a></span><span class="op">(</span><span class="st">"extdata"</span>, package <span class="op">=</span> <span class="st">"GRaNIEData"</span><span class="op">)</span>,
Christian Arnold's avatar
Christian Arnold committed
177
    full.names <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span><span class="op">)</span></code></pre></div>
178
179
180
181
182
183
184
185
186
<pre class="scroll-200"><code><span class="co">## [1] "/media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/countsATAC.75k.tsv.gz"   </span>
<span class="co">## [2] "/media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/countsRNA.sampled.tsv.gz"</span>
<span class="co">## [3] "/media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/metadata.sampled.tsv"    </span>
<span class="co">## [4] "/media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/TFBS_selected"</span></code></pre>
<div class="sourceCode" id="cb4"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span class="va">file_peaks</span> <span class="op">=</span> <span class="va">files</span><span class="op">[</span><span class="fu"><a href="https://rdrr.io/r/base/grep.html" class="external-link">grep</a></span><span class="op">(</span><span class="st">"countsATAC.75k.tsv.gz"</span>, <span class="va">files</span><span class="op">)</span><span class="op">]</span>
<span class="va">file_RNA</span> <span class="op">=</span> <span class="va">files</span><span class="op">[</span><span class="fu"><a href="https://rdrr.io/r/base/grep.html" class="external-link">grep</a></span><span class="op">(</span><span class="st">"countsRNA.sampled.tsv.gz"</span>, <span class="va">files</span><span class="op">)</span><span class="op">]</span>
<span class="va">file_sampleMetadata</span> <span class="op">=</span> <span class="va">files</span><span class="op">[</span><span class="fu"><a href="https://rdrr.io/r/base/grep.html" class="external-link">grep</a></span><span class="op">(</span><span class="st">"metadata.sampled.tsv"</span>, <span class="va">files</span><span class="op">)</span><span class="op">]</span>
<span class="va">folder_TFBS_first50</span> <span class="op">=</span> <span class="va">files</span><span class="op">[</span><span class="fu"><a href="https://rdrr.io/r/base/grep.html" class="external-link">grep</a></span><span class="op">(</span><span class="st">"TFBS_selected"</span>, <span class="va">files</span><span class="op">)</span><span class="op">]</span>
187

188
189
190
<span class="va">countsRNA.df</span> <span class="op">=</span> <span class="fu"><a href="https://readr.tidyverse.org/reference/read_delim.html" class="external-link">read_tsv</a></span><span class="op">(</span><span class="va">file_RNA</span>, col_types <span class="op">=</span> <span class="fu"><a href="https://readr.tidyverse.org/reference/cols.html" class="external-link">cols</a></span><span class="op">(</span><span class="op">)</span><span class="op">)</span>
<span class="va">countsPeaks.df</span> <span class="op">=</span> <span class="fu"><a href="https://readr.tidyverse.org/reference/read_delim.html" class="external-link">read_tsv</a></span><span class="op">(</span><span class="va">file_peaks</span>, col_types <span class="op">=</span> <span class="fu"><a href="https://readr.tidyverse.org/reference/cols.html" class="external-link">cols</a></span><span class="op">(</span><span class="op">)</span><span class="op">)</span>
<span class="va">sampleMetadata.df</span> <span class="op">=</span> <span class="fu"><a href="https://readr.tidyverse.org/reference/read_delim.html" class="external-link">read_tsv</a></span><span class="op">(</span><span class="va">file_sampleMetadata</span>, col_types <span class="op">=</span> <span class="fu"><a href="https://readr.tidyverse.org/reference/cols.html" class="external-link">cols</a></span><span class="op">(</span><span class="op">)</span><span class="op">)</span>
191
192
193

<span class="co"># Let's check how the data looks like</span>
<span class="va">countsRNA.df</span></code></pre></div>
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
<pre class="scroll-200"><code><span class="co">## <span style="color: #949494;"># A tibble: 35,033 × 30</span></span>
<span class="co">##    ENSEMBL babk_D bima_D cicb_D coyi_D diku_D eipl_D eiwy_D eofe_D fafq_D febc_D</span>
<span class="co">##    <span style="color: #949494; font-style: italic;">&lt;chr&gt;</span>    <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span></span>
<span class="co">## <span style="color: #BCBCBC;"> 1</span> ENSG00…  <span style="text-decoration: underline;">48</span>933  <span style="text-decoration: underline;">48</span>737  <span style="text-decoration: underline;">60</span>581  <span style="text-decoration: underline;">93</span>101  <span style="text-decoration: underline;">84</span>980  <span style="text-decoration: underline;">91</span>536  <span style="text-decoration: underline;">85</span>728  <span style="text-decoration: underline;">35</span>483  <span style="text-decoration: underline;">69</span>674  <span style="text-decoration: underline;">58</span>890</span>
<span class="co">## <span style="color: #BCBCBC;"> 2</span> ENSG00…  <span style="text-decoration: underline;">49</span>916  <span style="text-decoration: underline;">44</span>086  <span style="text-decoration: underline;">50</span>706  <span style="text-decoration: underline;">55</span>893  <span style="text-decoration: underline;">57</span>239  <span style="text-decoration: underline;">76</span>418  <span style="text-decoration: underline;">75</span>934  <span style="text-decoration: underline;">27</span>926  <span style="text-decoration: underline;">57</span>526  <span style="text-decoration: underline;">50</span>491</span>
<span class="co">## <span style="color: #BCBCBC;"> 3</span> ENSG00… <span style="text-decoration: underline;">281</span>733 <span style="text-decoration: underline;">211</span>703 <span style="text-decoration: underline;">269</span>460 <span style="text-decoration: underline;">239</span>116 <span style="text-decoration: underline;">284</span>509 <span style="text-decoration: underline;">389</span>989 <span style="text-decoration: underline;">351</span>867 <span style="text-decoration: underline;">164</span>615 <span style="text-decoration: underline;">257</span>471 <span style="text-decoration: underline;">304</span>203</span>
<span class="co">## <span style="color: #BCBCBC;"> 4</span> ENSG00…  <span style="text-decoration: underline;">98</span>943  <span style="text-decoration: underline;">77</span>503  <span style="text-decoration: underline;">92</span>402  <span style="text-decoration: underline;">80</span>927  <span style="text-decoration: underline;">96</span>690 <span style="text-decoration: underline;">138</span>149 <span style="text-decoration: underline;">115</span>875  <span style="text-decoration: underline;">64</span>368  <span style="text-decoration: underline;">91</span>627 <span style="text-decoration: underline;">100</span>039</span>
<span class="co">## <span style="color: #BCBCBC;"> 5</span> ENSG00…  <span style="text-decoration: underline;">14</span>749  <span style="text-decoration: underline;">15</span>571  <span style="text-decoration: underline;">16</span>540  <span style="text-decoration: underline;">16</span>383  <span style="text-decoration: underline;">16</span>886  <span style="text-decoration: underline;">21</span>892  <span style="text-decoration: underline;">18</span>045  <span style="text-decoration: underline;">10</span>026  <span style="text-decoration: underline;">14</span>663  <span style="text-decoration: underline;">15</span>830</span>
<span class="co">## <span style="color: #BCBCBC;"> 6</span> ENSG00…  <span style="text-decoration: underline;">64</span>459  <span style="text-decoration: underline;">63</span>734  <span style="text-decoration: underline;">71</span>317  <span style="text-decoration: underline;">69</span>612  <span style="text-decoration: underline;">72</span>097 <span style="text-decoration: underline;">100</span>487  <span style="text-decoration: underline;">78</span>536  <span style="text-decoration: underline;">38</span>572  <span style="text-decoration: underline;">65</span>446  <span style="text-decoration: underline;">76</span>910</span>
<span class="co">## <span style="color: #BCBCBC;"> 7</span> ENSG00…  <span style="text-decoration: underline;">57</span>449  <span style="text-decoration: underline;">55</span>736  <span style="text-decoration: underline;">70</span>798  <span style="text-decoration: underline;">66</span>334  <span style="text-decoration: underline;">66</span>424  <span style="text-decoration: underline;">91</span>801  <span style="text-decoration: underline;">94</span>729  <span style="text-decoration: underline;">40</span>413  <span style="text-decoration: underline;">56</span>916  <span style="text-decoration: underline;">66</span>382</span>
<span class="co">## <span style="color: #BCBCBC;"> 8</span> ENSG00…  <span style="text-decoration: underline;">15</span>451  <span style="text-decoration: underline;">15</span>570  <span style="text-decoration: underline;">15</span>534  <span style="text-decoration: underline;">15</span>945  <span style="text-decoration: underline;">10</span>583  <span style="text-decoration: underline;">22</span>601  <span style="text-decoration: underline;">16</span>086   <span style="text-decoration: underline;">9</span>275  <span style="text-decoration: underline;">16</span>092  <span style="text-decoration: underline;">15</span>291</span>
<span class="co">## <span style="color: #BCBCBC;"> 9</span> ENSG00…  <span style="text-decoration: underline;">18</span>717  <span style="text-decoration: underline;">18</span>757  <span style="text-decoration: underline;">20</span>051  <span style="text-decoration: underline;">18</span>066  <span style="text-decoration: underline;">19</span>648  <span style="text-decoration: underline;">28</span>572  <span style="text-decoration: underline;">25</span>240  <span style="text-decoration: underline;">11</span>258  <span style="text-decoration: underline;">17</span>739  <span style="text-decoration: underline;">20</span>347</span>
<span class="co">## <span style="color: #BCBCBC;">10</span> ENSG00… <span style="text-decoration: underline;">168</span>054 <span style="text-decoration: underline;">147</span>822 <span style="text-decoration: underline;">178</span>164 <span style="text-decoration: underline;">154</span>220 <span style="text-decoration: underline;">168</span>837 <span style="text-decoration: underline;">244</span>731 <span style="text-decoration: underline;">215</span>862  <span style="text-decoration: underline;">89</span>368 <span style="text-decoration: underline;">158</span>845 <span style="text-decoration: underline;">180</span>734</span>
<span class="co">## <span style="color: #949494;"># … with 35,023 more rows, and 19 more variables: fikt_D &lt;dbl&gt;, guss_D &lt;dbl&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   hayt_D &lt;dbl&gt;, hehd_D &lt;dbl&gt;, heja_D &lt;dbl&gt;, hiaf_D &lt;dbl&gt;, iill_D &lt;dbl&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   kuxp_D &lt;dbl&gt;, nukw_D &lt;dbl&gt;, oapg_D &lt;dbl&gt;, oevr_D &lt;dbl&gt;, pamv_D &lt;dbl&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   pelm_D &lt;dbl&gt;, podx_D &lt;dbl&gt;, qolg_D &lt;dbl&gt;, sojd_D &lt;dbl&gt;, vass_D &lt;dbl&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   xugn_D &lt;dbl&gt;, zaui_D &lt;dbl&gt;</span></span></code></pre>
<div class="sourceCode" id="cb6"><pre class="downlit sourceCode r">
213
<code class="sourceCode R"><span class="va">countsPeaks.df</span></code></pre></div>
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
<pre class="scroll-200"><code><span class="co">## <span style="color: #949494;"># A tibble: 75,000 × 32</span></span>
<span class="co">##    peakID  babk_D bima_D cicb_D coyi_D diku_D eipl_D eiwy_D eofe_D fafq_D febc_D</span>
<span class="co">##    <span style="color: #949494; font-style: italic;">&lt;chr&gt;</span>    <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>  <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span></span>
<span class="co">## <span style="color: #BCBCBC;"> 1</span> chr14:…      2      5      5      3      1      4      1      5      0     13</span>
<span class="co">## <span style="color: #BCBCBC;"> 2</span> chrX:1…      3      7     10      5      4      6      3     18      4     22</span>
<span class="co">## <span style="color: #BCBCBC;"> 3</span> chr15:…      5     28     38     11     20     19      7     53      5     22</span>
<span class="co">## <span style="color: #BCBCBC;"> 4</span> chr10:…      0     12      7      2      5      8      0     11      1     11</span>
<span class="co">## <span style="color: #BCBCBC;"> 5</span> chr12:…      5     14     18      5      3     13      5     15      2     25</span>
<span class="co">## <span style="color: #BCBCBC;"> 6</span> chr1:1…     12     21     36      6     20     29     12     44      2    105</span>
<span class="co">## <span style="color: #BCBCBC;"> 7</span> chr16:…      3     17     16      9      8     16      6     28      3     33</span>
<span class="co">## <span style="color: #BCBCBC;"> 8</span> chr17:…      4     11      6      3      0      3      2      9      1     14</span>
<span class="co">## <span style="color: #BCBCBC;"> 9</span> chr13:…     10     34     44     12     31     29      9     22      5     82</span>
<span class="co">## <span style="color: #BCBCBC;">10</span> chr1:2…     21    113     46     28     44     57     47    146     12     91</span>
<span class="co">## <span style="color: #949494;"># … with 74,990 more rows, and 21 more variables: fikt_D &lt;dbl&gt;, guss_D &lt;dbl&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   hayt_D &lt;dbl&gt;, hehd_D &lt;dbl&gt;, heja_D &lt;dbl&gt;, hiaf_D &lt;dbl&gt;, iill_D &lt;dbl&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   kuxp_D &lt;dbl&gt;, nukw_D &lt;dbl&gt;, oapg_D &lt;dbl&gt;, oevr_D &lt;dbl&gt;, pamv_D &lt;dbl&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   pelm_D &lt;dbl&gt;, podx_D &lt;dbl&gt;, qolg_D &lt;dbl&gt;, sojd_D &lt;dbl&gt;, vass_D &lt;dbl&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   xugn_D &lt;dbl&gt;, zaui_D &lt;dbl&gt;, uaqe_D &lt;dbl&gt;, qaqx_D &lt;dbl&gt;</span></span></code></pre>
<div class="sourceCode" id="cb8"><pre class="downlit sourceCode r">
233
<code class="sourceCode R"><span class="va">sampleMetadata.df</span></code></pre></div>
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
<pre class="scroll-200"><code><span class="co">## <span style="color: #949494;"># A tibble: 31 × 16</span></span>
<span class="co">##    sample_id assigned assigned_frac atac_date  clone condition  diff_start donor</span>
<span class="co">##    <span style="color: #949494; font-style: italic;">&lt;chr&gt;</span>        <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>         <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span> <span style="color: #949494; font-style: italic;">&lt;date&gt;</span>     <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span> <span style="color: #949494; font-style: italic;">&lt;chr&gt;</span>      <span style="color: #949494; font-style: italic;">&lt;date&gt;</span>     <span style="color: #949494; font-style: italic;">&lt;chr&gt;</span></span>
<span class="co">## <span style="color: #BCBCBC;"> 1</span> babk_D     5<span style="text-decoration: underline;">507</span>093         0.211 2015-12-04     2 IFNg_SL13… 2015-10-12 babk </span>
<span class="co">## <span style="color: #BCBCBC;"> 2</span> bima_D    23<span style="text-decoration: underline;">275</span>756         0.677 2014-12-12     1 IFNg_SL13… 2014-11-07 bima </span>
<span class="co">## <span style="color: #BCBCBC;"> 3</span> cicb_D    19<span style="text-decoration: underline;">751</span>751         0.580 2015-04-24     3 IFNg_SL13… 2015-03-30 cicb </span>
<span class="co">## <span style="color: #BCBCBC;"> 4</span> coyi_D     6<span style="text-decoration: underline;">733</span>642         0.312 2015-11-05     3 IFNg_SL13… 2015-09-30 coyi </span>
<span class="co">## <span style="color: #BCBCBC;"> 5</span> diku_D     7<span style="text-decoration: underline;">010</span>213         0.195 2015-11-13     1 IFNg_SL13… 2015-10-15 diku </span>
<span class="co">## <span style="color: #BCBCBC;"> 6</span> eipl_D    16<span style="text-decoration: underline;">923</span>025         0.520 2015-08-04     1 IFNg_SL13… 2015-06-30 eipl </span>
<span class="co">## <span style="color: #BCBCBC;"> 7</span> eiwy_D     9<span style="text-decoration: underline;">807</span>860         0.404 2015-12-02     1 IFNg_SL13… 2015-10-23 eiwy </span>
<span class="co">## <span style="color: #BCBCBC;"> 8</span> eofe_D    25<span style="text-decoration: underline;">687</span>477         0.646 2014-12-12     1 IFNg_SL13… 2014-11-01 eofe </span>
<span class="co">## <span style="color: #BCBCBC;"> 9</span> fafq_D     4<span style="text-decoration: underline;">600</span>004         0.415 2015-10-14     1 IFNg_SL13… 2015-09-16 fafq </span>
<span class="co">## <span style="color: #BCBCBC;">10</span> febc_D    31<span style="text-decoration: underline;">712</span>153         0.430 2015-08-04     2 IFNg_SL13… 2015-07-06 febc </span>
<span class="co">## <span style="color: #949494;"># … with 21 more rows, and 8 more variables: EB_formation &lt;date&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   macrophage_diff_days &lt;dbl&gt;, medium_changes &lt;dbl&gt;, mt_frac &lt;dbl&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   percent_duplication &lt;dbl&gt;, received_as &lt;chr&gt;, sex &lt;chr&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   short_long_ratio &lt;dbl&gt;</span></span></code></pre>
<div class="sourceCode" id="cb10"><pre class="downlit sourceCode r">
252
253
<code class="sourceCode R"><span class="co"># Save the name of the respective ID columns</span>
<span class="va">idColumn_peaks</span> <span class="op">=</span> <span class="st">"peakID"</span>
Christian Arnold's avatar
Christian Arnold committed
254
255
256
257
258
259
<span class="va">idColumn_RNA</span> <span class="op">=</span> <span class="st">"ENSEMBL"</span>

<span class="co"># For the sake of simplicity, we only take a subset of all samples here to</span>
<span class="co"># speed-up the vignette code countsRNA.df = countsRNA.df[1:10000,1:10]</span>
<span class="co"># countsPeaks.df = countsPeaks.df[1:20000,1:10]</span></code></pre></div>
<p>While we recommend raw counts for both peaks and RNA-Seq as input and offer several normalization choices in the pipeline, it is also possible to provide pre-normalized data. Note that the normalization method may have a large influence on the resulting <em>eGRN</em> network, so make sure the choice of normalization is reasonable. For more details, see the next sections.</p>
260
261
<p>As you can see, both peaks and RNA-Seq counts must have exactly one ID column, with all other columns being numeric. For peaks, this column may be called <em>peakID</em>, for example, but the exact name is not important and can be specified as a parameter later when adding the data to the object. The same applies for the RNA-Seq data, whereas a sensible choice here is <em>ensemblID</em>, for example.</p>
<p>For the peak ID column, the required format is “chr:start-end”, with <em>chr</em> denoting the chromosome, followed by “:”, and then <em>start</em>, “-”, and <em>end</em> for the peak start and end, respectively. As the coordinates for the peaks are needed in the pipeline, the format must be exactly as stated here.</p>
Christian Arnold's avatar
Christian Arnold committed
262
<p>You may notice that the peaks and RNA-seq data have different samples being included, and not all are overlapping. This is not a problem and as long as <em>some</em> samples are found in both of them, the <em>GRaNIE</em> pipeline can work with it. Note that only the shared sampels between both data modalities are kept, however, so make sure that the sample names match between them and share as many samples as possible.</p>
263
</div>
264
265
266
<div class="section level3">
<h3 id="initialize-a-granie-object">Initialize a <em>GRaNIE</em> object<a class="anchor" aria-label="anchor" href="#initialize-a-granie-object"></a>
</h3>
Christian Arnold's avatar
Christian Arnold committed
267
<p>We got all the data in the right format, we can start with our <em>GRaNIE</em> analysis now! We start by specifying some parameters such as the genome assembly version the data have been produced with, as well as some optional object metadata that helps us to distinguish this <em>GRaNIE</em> object from others.</p>
268
<div class="sourceCode" id="cb11"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
269
<code class="sourceCode R"><span class="va">genomeAssembly</span> <span class="op">=</span> <span class="st">"hg38"</span>  <span class="co">#Either hg19, hg38 or mm10. Both peaks and RNA data must have the same genome assembly</span>
270

271
272
273
<span class="co"># Optional and arbitrary list with information and metadata that is stored</span>
<span class="co"># within the GRaNIE object</span>
<span class="va">objectMetadata.l</span> <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/list.html" class="external-link">list</a></span><span class="op">(</span>name <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/paste.html" class="external-link">paste0</a></span><span class="op">(</span><span class="st">"Macrophages_infected_primed"</span><span class="op">)</span>, file_peaks <span class="op">=</span> <span class="va">file_peaks</span>,
274
275
276
277
    file_rna <span class="op">=</span> <span class="va">file_RNA</span>, file_sampleMetadata <span class="op">=</span> <span class="va">file_sampleMetadata</span>, genomeAssembly <span class="op">=</span> <span class="va">genomeAssembly</span><span class="op">)</span>

<span class="va">dir_output</span> <span class="op">=</span> <span class="st">"output"</span>

Christian Arnold's avatar
Christian Arnold committed
278
<span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/initializeGRN.html">initializeGRN</a></span><span class="op">(</span>objectMetadata <span class="op">=</span> <span class="va">objectMetadata.l</span>, outputFolder <span class="op">=</span> <span class="va">dir_output</span>,
279
    genomeAssembly <span class="op">=</span> <span class="va">genomeAssembly</span><span class="op">)</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
280
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:16:01] Empty GRN object created successfully. Type the object name (e.g., GRN) to retrieve summary information about it at any time.</span></code></pre>
281
<div class="sourceCode" id="cb13"><pre class="downlit sourceCode r">
282
<code class="sourceCode R"><span class="va">GRN</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
283
<pre class="scroll-200"><code><span class="co">## Object of class: GRaNIE  ( version 0.14.5 )</span>
284
285
286
287
288
289
290
291
292
293
294
295
296
<span class="co">## Data summary:</span>
<span class="co">##  Number of peaks: No peak data found.</span>
<span class="co">##  Number of genes: No RNA-seq data found.</span>
<span class="co">## Parameters:</span>
<span class="co">## Provided metadata:</span>
<span class="co">##   name :  Macrophages_infected_primed </span>
<span class="co">##   file_peaks :  /media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/countsATAC.75k.tsv.gz </span>
<span class="co">##   file_rna :  /media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/countsRNA.sampled.tsv.gz </span>
<span class="co">##   file_sampleMetadata :  /media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/metadata.sampled.tsv </span>
<span class="co">##   genomeAssembly :  hg38 </span>
<span class="co">## Connections:</span>
<span class="co">##  Number of genes (filtered, all):  NA ,  NA</span></code></pre>
<p>Initializing a <em>GRaNIE</em> object occurs in the function <code><a href="../reference/initializeGRN.html">initializeGRN()</a></code> and is trivial: All we need to specify is an output folder (this is where all the pipeline output is automatically being saved unless specified otherwise) and the genome assembly shortcut of the data. We currently support <em>hg19</em>, <em>hg38</em>, and <em>mm10</em>. Please contact us if you need additional genomes. The <em>metadata</em> argument is recommended but optional and may contain an arbitrarily complex named list that is stored as additional metadata for the <em>GRaNIE</em> object. Here, we decided to specify a name for the <em>GRaNIE</em> object as well as the original paths for all 3 input files and the genome assembly.</p>
297
<p>For more parameter details, see the R help (<code><a href="../reference/initializeGRN.html">?initializeGRN</a></code>).</p>
Christian Arnold's avatar
Christian Arnold committed
298
<p>At any time point, we can simply “print” a <em>GRaNIE</em> object by typing its name and a summary of the content is printed to the console.</p>
299
</div>
300
301
302
303
<div class="section level3">
<h3 id="add-data">Add data<a class="anchor" aria-label="anchor" href="#add-data"></a>
</h3>
<p>We are now ready to fill our empty object with data! After preparing the data beforehand, we can now use the data import function <code><a href="../reference/addData.html">addData()</a></code> to import both peaks and RNA-seq data to the <em>GRaNIE</em> object. In addition to the count tables, we explicitly specify the name of the ID columns. As mentioned before, the sample metadata is optional but recommended if available.</p>
304
<p>An important consideration is data normalization for RNA and ATAC. We currently support three choices of normalization: <em>quantile</em>, <em>DESeq_sizeFactor</em> and <em>none</em> and refer to the R help for more details (<code><a href="../reference/addData.html">?addData</a></code>). The default for RNA-Seq is a quantile normalization, while for the open chromatin peak data, it is <em>DESeq_sizeFactor</em> (i.e., a “regular” DESeq size factor normalization). Importantly, <em>DESeq_sizeFactor</em> requires raw data, while <em>quantile</em> does not necessarily. We nevertheless recommend raw data as input, although it is also possible to provide pre-normalized data as input and then topping this up with another normalization method or “none”.</p>
305
<div class="sourceCode" id="cb15"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
306
<code class="sourceCode R"><span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/addData.html">addData</a></span><span class="op">(</span><span class="va">GRN</span>, <span class="va">countsPeaks.df</span>, normalization_peaks <span class="op">=</span> <span class="st">"DESeq_sizeFactor"</span>,
307
    idColumn_peaks <span class="op">=</span> <span class="va">idColumn_peaks</span>, <span class="va">countsRNA.df</span>, normalization_rna <span class="op">=</span> <span class="st">"quantile"</span>,
308
    idColumn_RNA <span class="op">=</span> <span class="va">idColumn_RNA</span>, sampleMetadata <span class="op">=</span> <span class="va">sampleMetadata.df</span>, forceRerun <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:16:02] Normalize counts. Method: DESeq_sizeFactor, ID column: peakID</span>
<span class="co">## INFO [2022-01-21 13:16:08]  Finished successfully. Execution time: 6.1 secs</span>
<span class="co">## INFO [2022-01-21 13:16:08] Normalize counts. Method: quantile, ID column: ENSEMBL</span>
<span class="co">## INFO [2022-01-21 13:16:09]  Finished successfully. Execution time: 1 secs</span>
<span class="co">## INFO [2022-01-21 13:16:09] Subset RNA and peaks and keep only shared samples</span>
<span class="co">## INFO [2022-01-21 13:16:09]  Number of samples for RNA before filtering: 29</span>
<span class="co">## INFO [2022-01-21 13:16:09]  Number of samples for peaks before filtering: 31</span>
<span class="co">## INFO [2022-01-21 13:16:09]  29 samples (babk_D,bima_D,cicb_D,coyi_D,diku_D,eipl_D,eiwy_D,eofe_D,fafq_D,febc_D,fikt_D,guss_D,hayt_D,hehd_D,heja_D,hiaf_D,iill_D,kuxp_D,nukw_D,oapg_D,oevr_D,pamv_D,pelm_D,podx_D,qolg_D,sojd_D,vass_D,xugn_D,zaui_D) are shared between the peaks and RNA-Seq data</span>
<span class="co">## WARN [2022-01-21 13:16:09] The following samples from the peaks will be ignored for the classification due to missing overlap with RNA-Seq: uaqe_D,qaqx_D</span>
<span class="co">## INFO [2022-01-21 13:16:09]  Number of samples for RNA after filtering: 29</span>
<span class="co">## INFO [2022-01-21 13:16:09]  Number of samples for peaks data after filtering: 29</span>
<span class="co">## INFO [2022-01-21 13:16:09]  Finished successfully. Execution time: 0.1 secs</span>
<span class="co">## INFO [2022-01-21 13:16:09] Produce 1 permutations of RNA-counts</span>
<span class="co">## INFO [2022-01-21 13:16:09] Shuffling columns 1 times</span>
<span class="co">## INFO [2022-01-21 13:16:09]  Finished successfully. Execution time: 0 secs</span>
<span class="co">## INFO [2022-01-21 13:16:09] Parsing provided metadata...</span>
<span class="co">## INFO [2022-01-21 13:16:15] Check for overlapping peaks...</span>
<span class="co">## INFO [2022-01-21 13:16:17]  Calculate statistics for each peak (mean and CV)</span>
<span class="co">## INFO [2022-01-21 13:16:18]  Retrieve peak annotation using ChipSeeker. This may take a while</span>
<span class="co">## &gt;&gt; preparing features information...      2022-01-21 13:16:19 </span>
<span class="co">## &gt;&gt; identifying nearest features...        2022-01-21 13:16:22 </span>
<span class="co">## &gt;&gt; calculating distance from peak to TSS...   2022-01-21 13:16:23 </span>
<span class="co">## &gt;&gt; assigning genomic annotation...        2022-01-21 13:16:23 </span>
<span class="co">## &gt;&gt; adding gene annotation...          2022-01-21 13:16:58 </span>
<span class="co">## &gt;&gt; assigning chromosome lengths           2022-01-21 13:16:58 </span>
<span class="co">## &gt;&gt; done...                    2022-01-21 13:16:58 </span>
<span class="co">## INFO [2022-01-21 13:16:59] Calculate GC-content for peaks... </span>
<span class="co">## INFO [2022-01-21 13:17:02]  Finished successfully. Execution time: 3 secs</span>
<span class="co">## INFO [2022-01-21 13:17:02]  Calculate statistics for each gene (mean and CV)</span></code></pre>
Christian Arnold's avatar
Christian Arnold committed
338
<p>We can see from the output the details for the used normalization method, and the number of samples that are kept in the <em>GRaNIE</em> object. Here, all 29 samples from the RNA data are kept because they are also found in the peak data, while only 29 out of 31 samples from the peak data are also found in the RNA data, resulting in 29 shared samples overall. The RNA counts are also permuted, which will be the basis for all analysis and plots in subsequent steps that repeat the analysis for permuted data in addition to the real, non-permuted data.</p>
339
</div>
340
341
342
343
<div class="section level3">
<h3 id="quality-control-1-pca-plots">Quality control 1: PCA plots<a class="anchor" aria-label="anchor" href="#quality-control-1-pca-plots"></a>
</h3>
<p>It is time for our first QC plots using the function <code><a href="../reference/plotPCA_all.html">plotPCA_all()</a></code>! Now that we added peak and RNA data to the object, let’s check with a <em>Principal Component Analysis</em> (PCA) for both peak and RNA-seq data as well as the original input and the normalized data (unless normalization has been set to none, in which case they are identical to the original data) where the variation in the data comes from. If sample metadata has been provided in the <code><a href="../reference/addData.html">addData()</a></code> function (something we strongly recommend), they are automatically added to the PCA plots by coloring the PCA results according to the provided metadata, so that potential batch effects can be examined and identified. For more details, see the R help (<code><a href="../reference/plotPCA_all.html">?plotPCA_all</a></code>).</p>
344
<p>Note that while this step is recommended to do, it is fully optional from a workflow point of view.</p>
345
<div class="sourceCode" id="cb17"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
346
347
<code class="sourceCode R"><span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/plotPCA_all.html">plotPCA_all</a></span><span class="op">(</span><span class="va">GRN</span>, type <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="st">"rna"</span>, <span class="st">"peaks"</span><span class="op">)</span>, topn <span class="op">=</span> <span class="fl">500</span>, forceRerun <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span>
<span class="co">## INFO [2022-01-21 13:17:03] </span>
Christian Arnold's avatar
Christian Arnold committed
348
<span class="co">## Plotting PCA and metadata correlation of raw RNA data for all shared samples to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/PCA_sharedSamples_RNA.raw.pdf... This may take a few minutes</span>
Christian Arnold's avatar
Christian Arnold committed
349
350
351
352
<span class="co">## INFO [2022-01-21 13:17:05] Prepare PCA. Count transformation: vst</span>
<span class="co">## INFO [2022-01-21 13:17:05]  Writing to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/PCA_sharedSamples_RNA.raw.pdf</span>
<span class="co">## INFO [2022-01-21 13:17:07] Performing and summarizing PCs across metadata for top 500 features</span>
<span class="co">## INFO [2022-01-21 13:17:10] </span>
Christian Arnold's avatar
Christian Arnold committed
353
<span class="co">## Plotting PCA and metadata correlation of normalized RNA data for all shared samples to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/PCA_sharedSamples_RNA.normalized.pdf... This may take a few minutes</span>
Christian Arnold's avatar
Christian Arnold committed
354
355
356
357
358
359
360
361
362
363
364
<span class="co">## INFO [2022-01-21 13:17:10] Prepare PCA. Count transformation: none</span>
<span class="co">## INFO [2022-01-21 13:17:10]  Writing to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/PCA_sharedSamples_RNA.normalized.pdf</span>
<span class="co">## INFO [2022-01-21 13:17:12] Performing and summarizing PCs across metadata for top 500 features</span>
<span class="co">## INFO [2022-01-21 13:17:14] Plotting PCA and metadata correlation of raw peaks data for all shared samples to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/PCA_sharedSamples_peaks.raw.pdf... This may take a few minutes</span>
<span class="co">## INFO [2022-01-21 13:17:18] Prepare PCA. Count transformation: vst</span>
<span class="co">## INFO [2022-01-21 13:17:18]  Writing to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/PCA_sharedSamples_peaks.raw.pdf</span>
<span class="co">## INFO [2022-01-21 13:17:21] Performing and summarizing PCs across metadata for top 500 features</span>
<span class="co">## INFO [2022-01-21 13:17:24] Plotting PCA and metadata correlation of normalized peaks data for all shared samples to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/PCA_sharedSamples_peaks.normalized.pdf... This may take a few minutes</span>
<span class="co">## INFO [2022-01-21 13:17:24] Prepare PCA. Count transformation: none</span>
<span class="co">## INFO [2022-01-21 13:17:24]  Writing to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/PCA_sharedSamples_peaks.normalized.pdf</span>
<span class="co">## INFO [2022-01-21 13:17:27] Performing and summarizing PCs across metadata for top 500 features</span></code></pre></div>
365
<p>We can see from the output that four PDF files have been produced, each of which plots the PCA results for the most variable 500, 1000, and 5000 features, respectively. For reasons of brevity and organization, we describe their interpretation and meaning in detail in the Introductory vignette and not here, however (click here for guidance and example plots).</p>
366
</div>
367
368
369
370
<div class="section level3">
<h3 id="add-tfs-and-tfbs-and-overlap-with-peaks">Add TFs and TFBS and overlap with peaks<a class="anchor" aria-label="anchor" href="#add-tfs-and-tfbs-and-overlap-with-peaks"></a>
</h3>
<p>Now it is time to add data for TFs and predicted TF binding sites (TFBS)! Our <em>GRaNIE</em> package requires pre-computed TFBS that need to be in a specific format (see the <a href="packageDetails.html">Package Details Vignette</a> for details). In brief, a 6-column bed file must be present for each TF, with a specific file name that starts with the name of the TF, an arbitrary and optional suffix (here: “_TFBS”) and a particular file ending (supported are <em>bed</em> or <em>bed.gz</em>; here, we specify the latter). All these files must be located in a particular folder that the <code><a href="../reference/addTFBS.html">addTFBS()</a></code> functions then searches in order to identify those files that match the specified patterns. We provide example TFBS for the 3 genome assemblies we support, see the comment below and the <a href="packageDetails.html">Package Details Vignette</a> for details. After setting this up, we are ready to overlap the TFBS and the peaks by calling the function <code><a href="../reference/overlapPeaksAndTFBS.html">overlapPeaksAndTFBS()</a></code>.</p>
371
<p>For more parameter details, see the R help (<code><a href="../reference/addTFBS.html">?addTFBS</a></code> and <code><a href="../reference/overlapPeaksAndTFBS.html">?overlapPeaksAndTFBS</a></code>).</p>
Christian Arnold's avatar
Christian Arnold committed
372
<div class="sourceCode" id="cb18"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
373
<code class="sourceCode R"><span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/addTFBS.html">addTFBS</a></span><span class="op">(</span><span class="va">GRN</span>, motifFolder <span class="op">=</span> <span class="va">folder_TFBS_first50</span>, TFs <span class="op">=</span> <span class="st">"all"</span>, filesTFBSPattern <span class="op">=</span> <span class="st">"_TFBS"</span>,
374
    fileEnding <span class="op">=</span> <span class="st">".bed.gz"</span>, forceRerun <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
375
376
377
378
379
380
381
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:17:30] Checking database folder for matching files: /media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/TFBS_selected</span>
<span class="co">## INFO [2022-01-21 13:17:30] Found 75 matching TFs: AIRE.0.C, ANDR.0.A, ANDR.1.A, ANDR.2.A, AP2A.0.A, AP2B.0.B, ARI3A.0.D, ARNT2.0.D, ASCL1.0.A, ASCL2.0.D, ATF2.1.B, ATOH1.0.B, BACH1.0.A, BATF3.0.B, BC11A.0.A, BCL6.0.A, BHA15.0.B, BHE41.0.D, BPTF.0.D, BRAC.0.A, BRCA1.0.D, CDX1.0.C, CDX2.0.A, CEBPA.0.A, CENPB.0.D, CLOCK.0.C, COE1.0.A, COT1.0.C, COT1.1.C, COT2.0.A, COT2.1.A, CTCF.0.A, CTCFL.0.A, CUX2.0.D, DLX1.0.D, DLX2.0.D, DLX4.0.D, DLX6.0.D, DMBX1.0.D, DMRT1.0.D, E2F1.0.A, E2F3.0.A, E2F4.0.A, E2F6.0.A, E2F7.0.B, EGR1.0.A, EGR2.0.A, EGR2.1.A, EHF.0.B, ELF1.0.A, ELF3.0.A, ELK3.0.D, ERR1.0.A, ESR1.0.A, ESR1.1.A, ESR2.0.A, ESR2.1.A, ETS1.0.A, ETS2.0.B, ETV2.0.B, ETV4.0.B, ETV5.0.C, EVI1.0.B, FEZF1.0.C, FLI1.1.A, FOXA3.0.B, FOXB1.0.D, FOXC2.0.D, FOXD2.0.D, FOXD3.0.D, FOXF1.0.D, FOXO4.0.C, FOXP1.0.A, FOXP3.0.D, FUBP1.0.D</span>
<span class="co">## INFO [2022-01-21 13:17:30] Use all TF from the database folder /media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/TFBS_selected</span>
<span class="co">## INFO [2022-01-21 13:17:30] Reading file /media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/TFBS_selected/translationTable.csv</span>
<span class="co">## INFO [2022-01-21 13:17:30]  Finished successfully. Execution time: 0 secs</span>
<span class="co">## INFO [2022-01-21 13:17:30] Running the pipeline for 75 TF in total.</span></code></pre>
<div class="sourceCode" id="cb20"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
382
<code class="sourceCode R"><span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/overlapPeaksAndTFBS.html">overlapPeaksAndTFBS</a></span><span class="op">(</span><span class="va">GRN</span>, nCores <span class="op">=</span> <span class="fl">1</span>, forceRerun <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:17:30] Overlap peaks and TFBS using 1 cores. This may take a few minutes...</span>
<span class="co">## INFO [2022-01-21 13:17:32]  Calculating intersection for TF AIRE.0.C finished. Number of overlapping TFBS after filtering: 295</span>
<span class="co">## INFO [2022-01-21 13:17:34]  Calculating intersection for TF ANDR.0.A finished. Number of overlapping TFBS after filtering: 1182</span>
<span class="co">## INFO [2022-01-21 13:17:34]  Calculating intersection for TF ANDR.1.A finished. Number of overlapping TFBS after filtering: 1007</span>
<span class="co">## INFO [2022-01-21 13:17:35]  Calculating intersection for TF ANDR.2.A finished. Number of overlapping TFBS after filtering: 1385</span>
<span class="co">## INFO [2022-01-21 13:17:37]  Calculating intersection for TF ARI3A.0.D finished. Number of overlapping TFBS after filtering: 390</span>
<span class="co">## INFO [2022-01-21 13:17:38]  Calculating intersection for TF ARNT2.0.D finished. Number of overlapping TFBS after filtering: 1906</span>
<span class="co">## INFO [2022-01-21 13:17:39]  Calculating intersection for TF ASCL1.0.A finished. Number of overlapping TFBS after filtering: 3454</span>
<span class="co">## INFO [2022-01-21 13:17:40]  Calculating intersection for TF ASCL2.0.D finished. Number of overlapping TFBS after filtering: 2701</span>
<span class="co">## INFO [2022-01-21 13:17:41]  Calculating intersection for TF ATF2.1.B finished. Number of overlapping TFBS after filtering: 918</span>
<span class="co">## INFO [2022-01-21 13:17:42]  Calculating intersection for TF ATOH1.0.B finished. Number of overlapping TFBS after filtering: 2044</span>
<span class="co">## INFO [2022-01-21 13:17:43]  Calculating intersection for TF BACH1.0.A finished. Number of overlapping TFBS after filtering: 2786</span>
<span class="co">## INFO [2022-01-21 13:17:43]  Calculating intersection for TF BATF3.0.B finished. Number of overlapping TFBS after filtering: 1095</span>
<span class="co">## INFO [2022-01-21 13:17:46]  Calculating intersection for TF BC11A.0.A finished. Number of overlapping TFBS after filtering: 10545</span>
<span class="co">## INFO [2022-01-21 13:17:46]  Calculating intersection for TF BCL6.0.A finished. Number of overlapping TFBS after filtering: 1204</span>
<span class="co">## INFO [2022-01-21 13:17:47]  Calculating intersection for TF BHA15.0.B finished. Number of overlapping TFBS after filtering: 3414</span>
<span class="co">## INFO [2022-01-21 13:17:48]  Calculating intersection for TF BHE41.0.D finished. Number of overlapping TFBS after filtering: 1638</span>
<span class="co">## INFO [2022-01-21 13:17:49]  Calculating intersection for TF BPTF.0.D finished. Number of overlapping TFBS after filtering: 1388</span>
<span class="co">## INFO [2022-01-21 13:17:50]  Calculating intersection for TF BRCA1.0.D finished. Number of overlapping TFBS after filtering: 731</span>
<span class="co">## INFO [2022-01-21 13:17:51]  Calculating intersection for TF CDX1.0.C finished. Number of overlapping TFBS after filtering: 778</span>
<span class="co">## INFO [2022-01-21 13:17:52]  Calculating intersection for TF CDX2.0.A finished. Number of overlapping TFBS after filtering: 449</span>
<span class="co">## INFO [2022-01-21 13:17:53]  Calculating intersection for TF CEBPA.0.A finished. Number of overlapping TFBS after filtering: 1327</span>
<span class="co">## INFO [2022-01-21 13:17:54]  Calculating intersection for TF CENPB.0.D finished. Number of overlapping TFBS after filtering: 1057</span>
<span class="co">## INFO [2022-01-21 13:17:55]  Calculating intersection for TF CLOCK.0.C finished. Number of overlapping TFBS after filtering: 1328</span>
<span class="co">## INFO [2022-01-21 13:17:57]  Calculating intersection for TF CTCF.0.A finished. Number of overlapping TFBS after filtering: 8572</span>
<span class="co">## INFO [2022-01-21 13:17:58]  Calculating intersection for TF CTCFL.0.A finished. Number of overlapping TFBS after filtering: 8586</span>
<span class="co">## INFO [2022-01-21 13:17:59]  Calculating intersection for TF CUX2.0.D finished. Number of overlapping TFBS after filtering: 182</span>
<span class="co">## INFO [2022-01-21 13:17:59]  Calculating intersection for TF DLX1.0.D finished. Number of overlapping TFBS after filtering: 192</span>
<span class="co">## INFO [2022-01-21 13:18:00]  Calculating intersection for TF DLX2.0.D finished. Number of overlapping TFBS after filtering: 235</span>
<span class="co">## INFO [2022-01-21 13:18:01]  Calculating intersection for TF DLX4.0.D finished. Number of overlapping TFBS after filtering: 127</span>
<span class="co">## INFO [2022-01-21 13:18:01]  Calculating intersection for TF DLX6.0.D finished. Number of overlapping TFBS after filtering: 117</span>
<span class="co">## INFO [2022-01-21 13:18:02]  Calculating intersection for TF DMBX1.0.D finished. Number of overlapping TFBS after filtering: 132</span>
<span class="co">## INFO [2022-01-21 13:18:02]  Calculating intersection for TF DMRT1.0.D finished. Number of overlapping TFBS after filtering: 460</span>
<span class="co">## INFO [2022-01-21 13:18:03]  Calculating intersection for TF E2F1.0.A finished. Number of overlapping TFBS after filtering: 3117</span>
<span class="co">## INFO [2022-01-21 13:18:04]  Calculating intersection for TF E2F3.0.A finished. Number of overlapping TFBS after filtering: 1702</span>
<span class="co">## INFO [2022-01-21 13:18:05]  Calculating intersection for TF E2F4.0.A finished. Number of overlapping TFBS after filtering: 4214</span>
<span class="co">## INFO [2022-01-21 13:18:06]  Calculating intersection for TF E2F6.0.A finished. Number of overlapping TFBS after filtering: 5571</span>
<span class="co">## INFO [2022-01-21 13:18:07]  Calculating intersection for TF E2F7.0.B finished. Number of overlapping TFBS after filtering: 4742</span>
<span class="co">## INFO [2022-01-21 13:18:08]  Calculating intersection for TF COE1.0.A finished. Number of overlapping TFBS after filtering: 2350</span>
<span class="co">## INFO [2022-01-21 13:18:10]  Calculating intersection for TF EGR1.0.A finished. Number of overlapping TFBS after filtering: 8727</span>
<span class="co">## INFO [2022-01-21 13:18:13]  Calculating intersection for TF EGR2.0.A finished. Number of overlapping TFBS after filtering: 12510</span>
<span class="co">## INFO [2022-01-21 13:18:15]  Calculating intersection for TF EGR2.1.A finished. Number of overlapping TFBS after filtering: 8788</span>
<span class="co">## INFO [2022-01-21 13:18:16]  Calculating intersection for TF EHF.0.B finished. Number of overlapping TFBS after filtering: 4947</span>
<span class="co">## INFO [2022-01-21 13:18:17]  Calculating intersection for TF ELF1.0.A finished. Number of overlapping TFBS after filtering: 3497</span>
<span class="co">## INFO [2022-01-21 13:18:18]  Calculating intersection for TF ELF3.0.A finished. Number of overlapping TFBS after filtering: 5449</span>
<span class="co">## INFO [2022-01-21 13:18:19]  Calculating intersection for TF ELK3.0.D finished. Number of overlapping TFBS after filtering: 2171</span>
<span class="co">## INFO [2022-01-21 13:18:20]  Calculating intersection for TF ESR1.0.A finished. Number of overlapping TFBS after filtering: 1448</span>
<span class="co">## INFO [2022-01-21 13:18:21]  Calculating intersection for TF ESR1.1.A finished. Number of overlapping TFBS after filtering: 1604</span>
<span class="co">## INFO [2022-01-21 13:18:21]  Calculating intersection for TF ESR2.0.A finished. Number of overlapping TFBS after filtering: 1878</span>
<span class="co">## INFO [2022-01-21 13:18:23]  Calculating intersection for TF ESR2.1.A finished. Number of overlapping TFBS after filtering: 3875</span>
<span class="co">## INFO [2022-01-21 13:18:23]  Calculating intersection for TF ERR1.0.A finished. Number of overlapping TFBS after filtering: 1267</span>
<span class="co">## INFO [2022-01-21 13:18:24]  Calculating intersection for TF ETS1.0.A finished. Number of overlapping TFBS after filtering: 6255</span>
<span class="co">## INFO [2022-01-21 13:18:26]  Calculating intersection for TF ETS2.0.B finished. Number of overlapping TFBS after filtering: 7322</span>
<span class="co">## INFO [2022-01-21 13:18:27]  Calculating intersection for TF ETV2.0.B finished. Number of overlapping TFBS after filtering: 6413</span>
<span class="co">## INFO [2022-01-21 13:18:28]  Calculating intersection for TF ETV4.0.B finished. Number of overlapping TFBS after filtering: 5073</span>
<span class="co">## INFO [2022-01-21 13:18:30]  Calculating intersection for TF ETV5.0.C finished. Number of overlapping TFBS after filtering: 10335</span>
<span class="co">## INFO [2022-01-21 13:18:30]  Calculating intersection for TF FEZF1.0.C finished. Number of overlapping TFBS after filtering: 1030</span>
<span class="co">## INFO [2022-01-21 13:18:32]  Calculating intersection for TF FLI1.1.A finished. Number of overlapping TFBS after filtering: 8982</span>
<span class="co">## INFO [2022-01-21 13:18:32]  Calculating intersection for TF FOXA3.0.B finished. Number of overlapping TFBS after filtering: 485</span>
<span class="co">## INFO [2022-01-21 13:18:33]  Calculating intersection for TF FOXB1.0.D finished. Number of overlapping TFBS after filtering: 257</span>
<span class="co">## INFO [2022-01-21 13:18:34]  Calculating intersection for TF FOXC2.0.D finished. Number of overlapping TFBS after filtering: 676</span>
<span class="co">## INFO [2022-01-21 13:18:34]  Calculating intersection for TF FOXD2.0.D finished. Number of overlapping TFBS after filtering: 240</span>
<span class="co">## INFO [2022-01-21 13:18:35]  Calculating intersection for TF FOXD3.0.D finished. Number of overlapping TFBS after filtering: 958</span>
<span class="co">## INFO [2022-01-21 13:18:36]  Calculating intersection for TF FOXF1.0.D finished. Number of overlapping TFBS after filtering: 441</span>
<span class="co">## INFO [2022-01-21 13:18:37]  Calculating intersection for TF FOXO4.0.C finished. Number of overlapping TFBS after filtering: 392</span>
<span class="co">## INFO [2022-01-21 13:18:38]  Calculating intersection for TF FOXP1.0.A finished. Number of overlapping TFBS after filtering: 435</span>
<span class="co">## INFO [2022-01-21 13:18:38]  Calculating intersection for TF FOXP3.0.D finished. Number of overlapping TFBS after filtering: 358</span>
<span class="co">## INFO [2022-01-21 13:18:40]  Calculating intersection for TF FUBP1.0.D finished. Number of overlapping TFBS after filtering: 1034</span>
<span class="co">## INFO [2022-01-21 13:18:40]  Calculating intersection for TF EVI1.0.B finished. Number of overlapping TFBS after filtering: 243</span>
<span class="co">## INFO [2022-01-21 13:18:42]  Calculating intersection for TF COT1.0.C finished. Number of overlapping TFBS after filtering: 4789</span>
<span class="co">## INFO [2022-01-21 13:18:43]  Calculating intersection for TF COT1.1.C finished. Number of overlapping TFBS after filtering: 2968</span>
<span class="co">## INFO [2022-01-21 13:18:44]  Calculating intersection for TF COT2.0.A finished. Number of overlapping TFBS after filtering: 1403</span>
<span class="co">## INFO [2022-01-21 13:18:46]  Calculating intersection for TF COT2.1.A finished. Number of overlapping TFBS after filtering: 2539</span>
<span class="co">## INFO [2022-01-21 13:18:46]  Calculating intersection for TF BRAC.0.A finished. Number of overlapping TFBS after filtering: 740</span>
<span class="co">## INFO [2022-01-21 13:18:47]  Calculating intersection for TF AP2A.0.A finished. Number of overlapping TFBS after filtering: 2987</span>
<span class="co">## INFO [2022-01-21 13:18:48]  Calculating intersection for TF AP2B.0.B finished. Number of overlapping TFBS after filtering: 4197</span>
<span class="co">## INFO [2022-01-21 13:18:48]  Finished execution using 1 cores. TOTAL RUNNING TIME: 1.3 mins</span></code></pre>
Christian Arnold's avatar
Christian Arnold committed
460
<p>We see from the output that 75 TFs have been found in the specified input folder, and the number of TFBS that overlap our peaks for each of them. We now successfully added our TFs and TFBS to the <em>GRaNIE</em> object.</p>
461
</div>
462
463
464
465
<div class="section level3">
<h3 id="filter-data-optional">Filter data (optional)<a class="anchor" aria-label="anchor" href="#filter-data-optional"></a>
</h3>
<p>Optionally, we can filter both peaks and RNA-Seq data according to various criteria using the function <code><a href="../reference/filterData.html">filterData()</a></code>.</p>
466
467
468
469
470
471
472
473
474
<p>For the open chromatin peaks, we currently support three filters:</p>
<ol style="list-style-type: decimal">
<li>Filter by their normalized mean read counts (<em>minNormalizedMean_peaks</em>, default 5)</li>
<li>Filter by their size / width (in bp) and discarding peaks that exceed a particular threshold (<em>maxSize_peaks</em>, default: 10000 bp)</li>
<li>Filter by chromosome (only keep chromosomes that are provided as input to the function, <em>chrToKeep_peaks</em>)</li>
</ol>
<p>For RNA-seq, we currently support the analogous filter as for open chromatin for normalized mean counts as explained above (<em>minNormalizedMeanRNA</em>).</p>
<p>The default values are usually suitable for bulk data and should result in the removal of very few peaks / genes; however, for single-cell data, lowering them may more reasonable. The output will print clearly how many peaks and genes have been filtered, so you can rerun the function with different values if needed.</p>
<p>For more parameter details, see the R help (<code><a href="../reference/filterData.html">?filterData</a></code>).</p>
Christian Arnold's avatar
Christian Arnold committed
475
<div class="sourceCode" id="cb22"><pre class="downlit sourceCode r">
476
<code class="sourceCode R"><span class="co"># Chromosomes to keep for peaks. This should be a vector of chromosome names</span>
477
<span class="va">chrToKeep_peaks</span> <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/paste.html" class="external-link">paste0</a></span><span class="op">(</span><span class="st">"chr"</span>, <span class="fl">1</span><span class="op">:</span><span class="fl">22</span><span class="op">)</span>, <span class="st">"chrX"</span>, <span class="st">"chrY"</span><span class="op">)</span>
Christian Arnold's avatar
Christian Arnold committed
478
<span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/filterData.html">filterData</a></span><span class="op">(</span><span class="va">GRN</span>, minNormalizedMean_peaks <span class="op">=</span> <span class="fl">5</span>, minNormalizedMeanRNA <span class="op">=</span> <span class="fl">1</span>,
479
    chrToKeep_peaks <span class="op">=</span> <span class="va">chrToKeep_peaks</span>, maxSize_peaks <span class="op">=</span> <span class="fl">10000</span>, forceRerun <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:18:49] FILTER PEAKS</span>
<span class="co">## INFO [2022-01-21 13:18:49]  Number of peaks before filtering : 75000</span>
<span class="co">## INFO [2022-01-21 13:18:49]   Filter peaks by CV: Min = 0</span>
<span class="co">## INFO [2022-01-21 13:18:49]   Filter peaks by mean: Min = 5</span>
<span class="co">## INFO [2022-01-21 13:18:49]  Number of peaks after filtering : 64008</span>
<span class="co">## INFO [2022-01-21 13:18:49]  Finished successfully. Execution time: 0.1 secs</span>
<span class="co">## INFO [2022-01-21 13:18:49] Filter and sort peaks and remain only those on the following chromosomes: chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY</span>
<span class="co">## INFO [2022-01-21 13:18:49] Filter and sort peaks by size and remain only those smaller than : 10000</span>
<span class="co">## INFO [2022-01-21 13:18:49]  Number of peaks before filtering: 75000</span>
<span class="co">## INFO [2022-01-21 13:18:49]  Number of peaks after filtering : 75000</span>
<span class="co">## INFO [2022-01-21 13:18:49]  Finished successfully. Execution time: 0.4 secs</span>
<span class="co">## INFO [2022-01-21 13:18:49] Collectively, filter 10992 out of 75000 peaks.</span>
<span class="co">## INFO [2022-01-21 13:18:49] Number of remaining peaks: 64008</span>
<span class="co">## INFO [2022-01-21 13:18:50] FILTER RNA-seq</span>
<span class="co">## INFO [2022-01-21 13:18:50]  Number of genes before filtering : 61534</span>
<span class="co">## INFO [2022-01-21 13:18:50]   Filter genes by CV: Min = 0</span>
<span class="co">## INFO [2022-01-21 13:18:50]   Filter genes by mean: Min = 1</span>
<span class="co">## INFO [2022-01-21 13:18:50]  Number of genes after filtering : 18924</span>
<span class="co">## INFO [2022-01-21 13:18:50]  Finished successfully. Execution time: 0.1 secs</span>
<span class="co">## INFO [2022-01-21 13:18:50]  Number of rows in total: 35033</span>
<span class="co">## INFO [2022-01-21 13:18:50]  Flagged 16211 rows because the row mean was smaller than 1</span></code></pre>
501
<p>We can see from the output that no peaks have been filtered due to their size and almost 11,000 have been filtered due to their small mean read counts, which collectively leaves around 64,000 peaks out of 75,000 originally. For the RNA data, almost half of the data has been filtered (16,211 out of around 35,000 genes).</p>
502
</div>
503
504
505
506
<div class="section level3">
<h3 id="add-tf-peak-connections">Add TF-peak connections<a class="anchor" aria-label="anchor" href="#add-tf-peak-connections"></a>
</h3>
<p>We now have all necessary data in the object to start constructing our network. As explained in the <a href="packageDetails.html">Package Details Vignette</a>, we currently support two types of links for our <em>GRaNIE</em> approach:</p>
507
508
509
510
<ol style="list-style-type: decimal">
<li>TF - peak</li>
<li>peak - gene</li>
</ol>
511
<p>Let’s start with TF-peak links! For this, we employ the function <code><a href="../reference/addConnections_TF_peak.html">addConnections_TF_peak()</a></code>. By default, we use Pearson to calculate the correlations between TF expression and peak accessibility, but Spearman may sometimes be a better alternative, especially if the diagnostic plots show that the background is not looking as expected.</p>
Christian Arnold's avatar
Christian Arnold committed
512
<p>In addition to creating TF-peak links based on TF expression, we can also correlate peak accessibility with other measures. We call this the <em>connection type</em>, and <em>expression</em> is the default one in our framework. However, we implemented a flexible way of allowing also additional or other connection types. Briefly, this works as follows: Additional data has to be imported beforehand with a particular name (the name of the <em>connection type</em>). For example, measures that are related to so-called <em>TF activity</em> can be used in addition or as a replacement of TF <em>expression</em>. For each connection type that we want to include, we simply add it to the parameter <em>connectionTypes</em> along with the binary vector <em>removeNegativeCorrelation</em> that specifies whether or not negatively correlated pairs should be removed or not. For expression, the default is to not remove them, while removal may be more reasonable for measures related to TF activity.</p>
513
<p>Lastly, we offer a so called GC-correction that uses a GC-matching background to compare it with the foreground instead of using the full background as comparison. We are still investigating the plausibility and effects of this and therefore mark this feature as experimental as of now.</p>
Christian Arnold's avatar
Christian Arnold committed
514
<p>This function may run a while, and each time-consuming step has a built-in progress bar so the remaining time can be estimated. Note that the TF-peak links are constructed for both the original, non-permuted data (in the corresponding output plots that are produced, this is labeled as <em>original</em>) and permuted data (<em>permuted</em>). For more parameter options and parameter details, see the R help (<code><a href="../reference/addConnections_TF_peak.html">?addConnections_TF_peak</a></code>).</p>
Christian Arnold's avatar
Christian Arnold committed
515
<div class="sourceCode" id="cb24"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
516
<code class="sourceCode R"><span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/addConnections_TF_peak.html">addConnections_TF_peak</a></span><span class="op">(</span><span class="va">GRN</span>, plotDiagnosticPlots <span class="op">=</span> <span class="cn">TRUE</span>, connectionTypes <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="st">"expression"</span><span class="op">)</span>,
Christian Arnold's avatar
Christian Arnold committed
517
    corMethod <span class="op">=</span> <span class="st">"pearson"</span>, forceRerun <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
518
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:18:50] </span>
519
520
<span class="co">## Real data</span>
<span class="co">## </span>
Christian Arnold's avatar
Christian Arnold committed
521
522
523
524
525
526
527
528
529
530
531
532
<span class="co">## INFO [2022-01-21 13:18:50] Calculate TF-peak links for connection type expression</span>
<span class="co">## INFO [2022-01-21 13:18:50]  Correlate expression and peak counts</span>
<span class="co">## INFO [2022-01-21 13:18:50]   Retain 59 rows from TF/gene data out of 18822 (filter non-TF genes and TF genes with 0 counts throughout and keep only unique ENSEMBL IDs).</span>
<span class="co">## INFO [2022-01-21 13:18:50]   Correlate TF/gene data for 59 unique Ensembl IDs (TFs) and peak counts for 64008 peaks.</span>
<span class="co">## INFO [2022-01-21 13:18:50]   Note: For subsequent steps, the same gene may be associated with multiple TF, depending on the translation table.</span>
<span class="co">## INFO [2022-01-21 13:18:52]   Finished successfully. Execution time: 1.9 secs</span>
<span class="co">## INFO [2022-01-21 13:18:52]  Run FDR calculations for 65 TFs for which TFBS predictions and expression data for the corresponding gene are available.</span>
<span class="co">## INFO [2022-01-21 13:18:52]   Skip the following 10 TF due to missing data: ATOH1.0.B,CDX1.0.C,CTCFL.0.A,DLX6.0.D,DMRT1.0.D,EHF.0.B,ESR2.0.A,ESR2.1.A,FOXA3.0.B,FOXB1.0.D</span>
<span class="co">## INFO [2022-01-21 13:18:52]   Compute FDR for each TF. This may take a while...</span>
<span class="co">## INFO [2022-01-21 13:19:00]   Finished successfully. Execution time: 10 secs</span>
<span class="co">## INFO [2022-01-21 13:19:00]  Finished successfully. Execution time: 10.1 secs</span>
<span class="co">## INFO [2022-01-21 13:19:00] </span>
533
534
<span class="co">## Permuted data</span>
<span class="co">## </span>
Christian Arnold's avatar
Christian Arnold committed
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
<span class="co">## INFO [2022-01-21 13:19:00] Shuffling rows per column</span>
<span class="co">## INFO [2022-01-21 13:19:01]  Finished successfully. Execution time: 0.5 secs</span>
<span class="co">## INFO [2022-01-21 13:19:01] Calculate TF-peak links for connection type expression</span>
<span class="co">## INFO [2022-01-21 13:19:01]  Correlate expression and peak counts</span>
<span class="co">## INFO [2022-01-21 13:19:01]   Retain 59 rows from TF/gene data out of 18822 (filter non-TF genes and TF genes with 0 counts throughout and keep only unique ENSEMBL IDs).</span>
<span class="co">## INFO [2022-01-21 13:19:01]   Correlate TF/gene data for 59 unique Ensembl IDs (TFs) and peak counts for 64008 peaks.</span>
<span class="co">## INFO [2022-01-21 13:19:01]   Note: For subsequent steps, the same gene may be associated with multiple TF, depending on the translation table.</span>
<span class="co">## INFO [2022-01-21 13:19:01]   Finished successfully. Execution time: 0.5 secs</span>
<span class="co">## INFO [2022-01-21 13:19:01]  Run FDR calculations for 65 TFs for which TFBS predictions and expression data for the corresponding gene are available.</span>
<span class="co">## INFO [2022-01-21 13:19:01]   Skip the following 10 TF due to missing data: ATOH1.0.B,CDX1.0.C,CTCFL.0.A,DLX6.0.D,DMRT1.0.D,EHF.0.B,ESR2.0.A,ESR2.1.A,FOXA3.0.B,FOXB1.0.D</span>
<span class="co">## INFO [2022-01-21 13:19:02]   Compute FDR for each TF. This may take a while...</span>
<span class="co">## INFO [2022-01-21 13:19:10]   Finished successfully. Execution time: 8.8 secs</span>
<span class="co">## INFO [2022-01-21 13:19:10]  Finished successfully. Execution time: 9.5 secs</span>
<span class="co">## INFO [2022-01-21 13:19:10] Plotting FDR curves for each TF to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/TF_peak.fdrCurves_original.pdf</span>
<span class="co">## INFO [2022-01-21 13:19:10]  Including a total of 65 TF. Preparing plots...</span>
<span class="co">## INFO [2022-01-21 13:19:12]  Finished generating plots, start plotting to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/TF_peak.fdrCurves_original.pdf. This may take a few minutes.</span></code></pre>
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:19:32] Finished writing plots to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/TF_peak.fdrCurves_original.pdf</span>
<span class="co">## INFO [2022-01-21 13:19:32]  Finished successfully. Execution time: 22.2 secs</span>
<span class="co">## INFO [2022-01-21 13:19:32] Plotting FDR curves for each TF to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/TF_peak.fdrCurves_permuted.pdf</span>
<span class="co">## INFO [2022-01-21 13:19:32]  Including a total of 65 TF. Preparing plots...</span>
<span class="co">## INFO [2022-01-21 13:19:34]  Finished generating plots, start plotting to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/TF_peak.fdrCurves_permuted.pdf. This may take a few minutes.</span></code></pre>
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:19:56] Finished writing plots to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/TF_peak.fdrCurves_permuted.pdf</span>
<span class="co">## INFO [2022-01-21 13:19:56]  Finished successfully. Execution time: 24 secs</span></code></pre>
558
559
<p>From the output, we see that a total of 65 TFs have RNA-seq data available and consequently will be included and correlated with the peak accessibility. The created PDF files are mentioned in the output and these we’ll take a look at now!</p>
</div>
560
561
562
<div class="section level3">
<h3 id="quality-control-2-diagnostic-plots-for-tf-peak-connections">Quality control 2: Diagnostic plots for TF-peak connections<a class="anchor" aria-label="anchor" href="#quality-control-2-diagnostic-plots-for-tf-peak-connections"></a>
</h3>
Christian Arnold's avatar
Christian Arnold committed
563
<p>After adding the TF-peak links to our <em>GRaNIE</em> object, let’s look at some diagnostic plots. The <em>plots</em> folder within the specified output folder when initializing the <em>GRaNIE</em> object should now contain two new files that are named <em>TF_peak.fdrCurves_original.pdf</em> and <em>TF_peak.fdrCurves_permuted.pdf</em>. For reasons of brevity and organization, we describe their interpretation and meaning in detail in the Introductory vignette and not here, however.</p>
564
565
566
567
568
569
570
571
<!-- Here is what the plot for the original data for the TF *SP1.0.A* looks like: -->
<!-- <div align="center"> -->
<!-- <figure> -->
<!-- <img src="figs/TF-peak_diagnosticPlots1.png" height="600px"/> -->
<!-- <figcaption><i>Figure 3 - Diagnostic plots for TF-peak links for a particular TF, *SP1.0.A*.</i></figcaption> -->
<!-- </figure> -->
<!-- </div> -->
<!-- We can see that the TF-peak FDR is below 0.2 for positive correlation bins in the positive direction only (upper plot), while for the negative direction (lower plot), it is always above 0.4, regardless of the correlation bin. Here, correlation bin refers to the correlation of a particular SP1.0.A - peak pair that has been discretized accordingly (i.e., a correlation of 0.07 would go into (0.05-0.10] correlation bin)). Usually, depending on the mode of action of a TF, either one of the two directions may show a low FDR, but rarely both. Thus, positively correlated SP1.0.A - peak pairs seem to be significant and will be retained for the final *GRN* network. As an exercise, check how this plot looks like for the permuted data! Would you expect lower or higher FDRs? -->
572
</div>
573
574
575
576
577
<div class="section level3">
<h3 id="run-the-ar-classification-and-qc-optional">Run the AR classification and QC (optional)<a class="anchor" aria-label="anchor" href="#run-the-ar-classification-and-qc-optional"></a>
</h3>
<p>Transcription factors (TFs) regulate many cellular processes and can therefore serve as readouts of the signaling and regulatory state. Yet for many TFs, the mode of action—repressing or activating transcription of target genes—is unclear. In analogy to our <em>diffTF</em> approach that we recently published to calculate differential TF activity,the classification of TFs into putative transcriptional activators or repressors can also be done from within the <em>GRaNIE</em> framework in an identical fashion. This can be achieved with the function <code><a href="../reference/AR_classification_wrapper.html">AR_classification_wrapper()</a></code>.</p>
<p><strong>Note that this step is fully optional and can be skipped. The output of the function is not used for subsequent steps.</strong>. To keep the memory footprint of the <em>GRaNIE</em> object low, we recommend to set <code>deleteIntermediateData = TRUE</code>.</p>
Christian Arnold's avatar
Christian Arnold committed
578
<div class="sourceCode" id="cb28"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
579
<code class="sourceCode R"><span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/AR_classification_wrapper.html">AR_classification_wrapper</a></span><span class="op">(</span><span class="va">GRN</span>, significanceThreshold_Wilcoxon <span class="op">=</span> <span class="fl">0.05</span>,
580
    plot_minNoTFBS_heatmap <span class="op">=</span> <span class="fl">100</span>, plotDiagnosticPlots <span class="op">=</span> <span class="cn">TRUE</span>, deleteIntermediateData <span class="op">=</span> <span class="cn">TRUE</span>,
581
    forceRerun <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
582
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:19:56]  Connection type expression</span>
583
<span class="co">## </span>
Christian Arnold's avatar
Christian Arnold committed
584
<span class="co">## INFO [2022-01-21 13:19:56]  Real data</span>
585
<span class="co">## </span>
Christian Arnold's avatar
Christian Arnold committed
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
<span class="co">## INFO [2022-01-21 13:19:56]  Correlate expression and peak counts</span>
<span class="co">## INFO [2022-01-21 13:19:56]  Retain 59 rows from TF/gene data out of 18822 (filter non-TF genes and TF genes with 0 counts throughout and keep only unique ENSEMBL IDs).</span>
<span class="co">## INFO [2022-01-21 13:19:56]  Correlate TF/gene data for 59 unique Ensembl IDs (TFs) and peak counts for 64008 peaks.</span>
<span class="co">## INFO [2022-01-21 13:19:56]  Note: For subsequent steps, the same gene may be associated with multiple TF, depending on the translation table.</span>
<span class="co">## INFO [2022-01-21 13:19:56]  Finished successfully. Execution time: 0.4 secs</span>
<span class="co">## INFO [2022-01-21 13:19:57] Compute foreground and background as well as their median values per TF</span>
<span class="co">## INFO [2022-01-21 13:19:59]  Finished successfully. Execution time: 2.1 secs</span>
<span class="co">## INFO [2022-01-21 13:19:59] Calculate classification thresholds for repressors / activators</span>
<span class="co">## INFO [2022-01-21 13:19:59]  Stringency 0.1: -0.0306 / 0.0157</span>
<span class="co">## INFO [2022-01-21 13:19:59]  Stringency 0.05: -0.0439 / 0.0195</span>
<span class="co">## INFO [2022-01-21 13:19:59]  Stringency 0.01: -0.0536 / 0.024</span>
<span class="co">## INFO [2022-01-21 13:20:00]  Stringency 0.001: -0.0563 / 0.028</span>
<span class="co">## INFO [2022-01-21 13:20:00]  Finished successfully. Execution time: 1 secs</span>
<span class="co">## INFO [2022-01-21 13:20:00] Finalize classification</span>
<span class="co">## INFO [2022-01-21 13:20:00]  Perform Wilcoxon test for each TF. This may take a few minutes.</span>
<span class="co">## INFO [2022-01-21 13:20:13]   Stringency 0.1</span>
<span class="co">## INFO [2022-01-21 13:20:13]    Change the following TFs to 'undetermined' as they were classified as activator/repressor before but the Wilcoxon test was not significant: AIRE.0.C,ASCL2.0.D,CUX2.0.D</span>
<span class="co">## INFO [2022-01-21 13:20:13]   Stringency 0.05</span>
<span class="co">## INFO [2022-01-21 13:20:13]    Change the following TFs to 'undetermined' as they were classified as activator/repressor before but the Wilcoxon test was not significant: AIRE.0.C</span>
<span class="co">## INFO [2022-01-21 13:20:13]   Stringency 0.01</span>
<span class="co">## INFO [2022-01-21 13:20:13]   Stringency 0.001</span>
<span class="co">## INFO [2022-01-21 13:20:13]  Summary of classification:</span>
<span class="co">## INFO [2022-01-21 13:20:13]   Column classification_q0.1_final</span>
<span class="co">## INFO [2022-01-21 13:20:13]    activator: 25,    undetermined: 17,    repressor: 23,    not-expressed: 10</span>
<span class="co">## INFO [2022-01-21 13:20:13]   Column classification_q0.05_final</span>
<span class="co">## INFO [2022-01-21 13:20:13]    activator: 23,    undetermined: 22,    repressor: 20,    not-expressed: 10</span>
<span class="co">## INFO [2022-01-21 13:20:13]   Column classification_q0.01_final</span>
<span class="co">## INFO [2022-01-21 13:20:13]    activator: 21,    undetermined: 27,    repressor: 17,    not-expressed: 10</span>
<span class="co">## INFO [2022-01-21 13:20:13]   Column classification_q0.001_final</span>
<span class="co">## INFO [2022-01-21 13:20:13]    activator: 19,    undetermined: 30,    repressor: 16,    not-expressed: 10</span>
<span class="co">## INFO [2022-01-21 13:20:13]  Finished successfully. Execution time: 13.3 secs</span>
<span class="co">## INFO [2022-01-21 13:20:13] Plotting density plots with foreground and background for each TF to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/TF_classification_densityPlotsForegroundBackground_expression_original.pdf</span></code></pre>
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:20:15]  Finished successfully. Execution time: 1.4 secs</span>
<span class="co">## INFO [2022-01-21 13:20:15] Plotting AR summary plot to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/TF_classification_stringencyThresholds_expression_original.pdf</span></code></pre>
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:20:15]  Finished successfully. Execution time: 0.1 secs</span>
<span class="co">## INFO [2022-01-21 13:20:15] Plotting AR heatmap to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/TF_classification_summaryHeatmap_expression_original.pdf</span></code></pre>
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:20:16]  Finished successfully. Execution time: 1.1 secs</span>
<span class="co">## INFO [2022-01-21 13:20:16]  Permuted data</span>
624
<span class="co">## </span>
Christian Arnold's avatar
Christian Arnold committed
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
<span class="co">## INFO [2022-01-21 13:20:16]  Correlate expression and peak counts</span>
<span class="co">## INFO [2022-01-21 13:20:16]  Retain 59 rows from TF/gene data out of 18822 (filter non-TF genes and TF genes with 0 counts throughout and keep only unique ENSEMBL IDs).</span>
<span class="co">## INFO [2022-01-21 13:20:16]  Correlate TF/gene data for 59 unique Ensembl IDs (TFs) and peak counts for 64008 peaks.</span>
<span class="co">## INFO [2022-01-21 13:20:16]  Note: For subsequent steps, the same gene may be associated with multiple TF, depending on the translation table.</span>
<span class="co">## INFO [2022-01-21 13:20:18]  Finished successfully. Execution time: 1.6 secs</span>
<span class="co">## INFO [2022-01-21 13:20:18] Shuffling rows per column</span>
<span class="co">## INFO [2022-01-21 13:20:18]  Finished successfully. Execution time: 0.5 secs</span>
<span class="co">## INFO [2022-01-21 13:20:18] Compute foreground and background as well as their median values per TF</span>
<span class="co">## INFO [2022-01-21 13:20:19]  Finished successfully. Execution time: 1.1 secs</span>
<span class="co">## INFO [2022-01-21 13:20:19] Calculate classification thresholds for repressors / activators</span>
<span class="co">## INFO [2022-01-21 13:20:20]  Stringency 0.1: -0.0194 / 0.0193</span>
<span class="co">## INFO [2022-01-21 13:20:20]  Stringency 0.05: -0.0205 / 0.0244</span>
<span class="co">## INFO [2022-01-21 13:20:20]  Stringency 0.01: -0.0269 / 0.0317</span>
<span class="co">## INFO [2022-01-21 13:20:20]  Stringency 0.001: -0.0293 / 0.0317</span>
<span class="co">## INFO [2022-01-21 13:20:20]  Finished successfully. Execution time: 1.1 secs</span>
<span class="co">## INFO [2022-01-21 13:20:20] Finalize classification</span>
<span class="co">## INFO [2022-01-21 13:20:21]  Perform Wilcoxon test for each TF. This may take a few minutes.</span>
<span class="co">## INFO [2022-01-21 13:20:33]   Stringency 0.1</span>
<span class="co">## INFO [2022-01-21 13:20:33]    Change the following TFs to 'undetermined' as they were classified as activator/repressor before but the Wilcoxon test was not significant: ANDR.1.A,ANDR.2.A,ARI3A.0.D,BC11A.0.A,BCL6.0.A,BPTF.0.D,BRAC.0.A,BRCA1.0.D,DLX1.0.D,DMBX1.0.D,E2F7.0.B,ELF1.0.A,ELK3.0.D,EVI1.0.B,FOXD3.0.D</span>
<span class="co">## INFO [2022-01-21 13:20:33]   Stringency 0.05</span>
<span class="co">## INFO [2022-01-21 13:20:33]    Change the following TFs to 'undetermined' as they were classified as activator/repressor before but the Wilcoxon test was not significant: ANDR.1.A,ANDR.2.A,ARI3A.0.D,BC11A.0.A,BCL6.0.A,BRAC.0.A,BRCA1.0.D,DLX1.0.D,DMBX1.0.D,E2F7.0.B,ELK3.0.D,EVI1.0.B</span>
<span class="co">## INFO [2022-01-21 13:20:33]   Stringency 0.01</span>
<span class="co">## INFO [2022-01-21 13:20:33]    Change the following TFs to 'undetermined' as they were classified as activator/repressor before but the Wilcoxon test was not significant: ANDR.2.A,BRAC.0.A,DLX1.0.D</span>
<span class="co">## INFO [2022-01-21 13:20:33]   Stringency 0.001</span>
<span class="co">## INFO [2022-01-21 13:20:33]    Change the following TFs to 'undetermined' as they were classified as activator/repressor before but the Wilcoxon test was not significant: ANDR.2.A,DLX1.0.D</span>
<span class="co">## INFO [2022-01-21 13:20:33]  Summary of classification:</span>
<span class="co">## INFO [2022-01-21 13:20:33]   Column classification_q0.1_final</span>
<span class="co">## INFO [2022-01-21 13:20:33]    activator: 3,    undetermined: 62,    repressor: 0,    not-expressed: 10</span>
<span class="co">## INFO [2022-01-21 13:20:33]   Column classification_q0.05_final</span>
<span class="co">## INFO [2022-01-21 13:20:33]    activator: 1,    undetermined: 64,    repressor: 0,    not-expressed: 10</span>
<span class="co">## INFO [2022-01-21 13:20:33]   Column classification_q0.01_final</span>
<span class="co">## INFO [2022-01-21 13:20:33]    activator: 1,    undetermined: 64,    repressor: 0,    not-expressed: 10</span>
<span class="co">## INFO [2022-01-21 13:20:33]   Column classification_q0.001_final</span>
<span class="co">## INFO [2022-01-21 13:20:34]    activator: 1,    undetermined: 64,    repressor: 0,    not-expressed: 10</span>
<span class="co">## INFO [2022-01-21 13:20:34]  Finished successfully. Execution time: 13.1 secs</span>
<span class="co">## INFO [2022-01-21 13:20:34] Plotting density plots with foreground and background for each TF to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/TF_classification_densityPlotsForegroundBackground_expression_permuted.pdf</span></code></pre>
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:20:35]  Finished successfully. Execution time: 1.4 secs</span>
<span class="co">## INFO [2022-01-21 13:20:35] Plotting AR summary plot to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/TF_classification_stringencyThresholds_expression_permuted.pdf</span></code></pre>
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:20:35]  Finished successfully. Execution time: 0.1 secs</span>
<span class="co">## INFO [2022-01-21 13:20:35] Shuffling rows per column</span>
<span class="co">## INFO [2022-01-21 13:20:36]  Finished successfully. Execution time: 0.4 secs</span>
<span class="co">## INFO [2022-01-21 13:20:36] Plotting AR heatmap to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/TF_classification_summaryHeatmap_expression_permuted.pdf</span></code></pre>
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:20:38]  Finished successfully. Execution time: 1.9 secs</span></code></pre>
668
669
670
<p>From the output, we see that the classification has been run for both real and permuted data, as before. For permuted data, almost all TFs are classified as <em>undetermined</em>, while for the non-permuted one, the majority of TFs is either an activator or repressor. This is irrespective of the classification stringency. Overall, this is not surprising and in fact re-assuring and indicates we capture real signal.</p>
<p>The contents of these plots is identical to and uses in fact practically the same code as our <em>diffTF</em> software. We refer to the following links for more details:</p>
<ol style="list-style-type: decimal">
671
672
<li><a href="https://doi.org/10.1016/j.celrep.2019.10.106" title="The original publication explaining the method and motivation in detail" class="external-link">The official <em>diffTF</em> paper</a></li>
<li>In general, the <a href="https://difftf.readthedocs.io" title="the *ReadTheDocs* help for *diffTF*" class="external-link">ReadTheDocs documentaion</a>, and in particular <a href="https://difftf.readthedocs.io/en/latest/chapter2.html#files-comparisontype-diagnosticplotsclassification1-pdf-and-comparisontype-diagnosticplotsclassification2-pdf" title="the following part*" class="external-link">this chapter</a>. In <em>File {comparisonType}.diagnosticPlotsClassification1.pdf:, pages 1-4</em>, the content of the files “TF_classification_stringencyThresholds* are explained in detail, while in <em>File {comparisonType}.diagnosticPlotsClassification2.pdf:, Page 20 - end</em> the contents of the files <em>TF_classification_summaryHeatmap</em> and <em>TF_classification_densityPlotsForegroundBackground</em> are elaborated upon.</li>
673
674
675
</ol>
<p>For more parameter details, see the R help (<code><a href="../reference/AR_classification_wrapper.html">?AR_classification_wrapper</a></code>).</p>
</div>
676
677
678
<div class="section level3">
<h3 id="save-granie-object-to-disk-optional">Save <em>GRaNIE</em> object to disk (optional)<a class="anchor" aria-label="anchor" href="#save-granie-object-to-disk-optional"></a>
</h3>
Christian Arnold's avatar
Christian Arnold committed
679
<p>After steps that take up a bit of time, it may make sense to store the <em>GRaNIE</em> object to disk in order to be able to restore it at any time point. This can simply be done, for example, by saving it as an <em>rds</em> file using the built-in function <em>saveRDS</em> from R to save our <em>GRaNIE</em> object in a compressed rds format.</p>
Christian Arnold's avatar
Christian Arnold committed
680
<div class="sourceCode" id="cb36"><pre class="downlit sourceCode r">
681
682
683
684
<code class="sourceCode R"><span class="va">GRN_file_outputRDS</span> <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/paste.html" class="external-link">paste0</a></span><span class="op">(</span><span class="va">dir_output</span>, <span class="st">"/GRN.rds"</span><span class="op">)</span>
<span class="fu"><a href="https://rdrr.io/r/base/readRDS.html" class="external-link">saveRDS</a></span><span class="op">(</span><span class="va">GRN</span>, <span class="va">GRN_file_outputRDS</span><span class="op">)</span></code></pre></div>
<p>You can then, at any time point, read it back into R with the following line:</p>
<p><code>GRN = readRDS(GRN_file_outputRDS)</code></p>
685
</div>
686
687
688
689
690
<div class="section level3">
<h3 id="add-peak-gene-connections">Add peak-gene connections<a class="anchor" aria-label="anchor" href="#add-peak-gene-connections"></a>
</h3>
<p>Let’s add now the second type of connections, peak-genes! This can be done via the function <code><a href="../reference/addConnections_peak_gene.html">addConnections_peak_gene()</a></code>.</p>
<p>TODO more Note that to make the function run faster, we restrict the maximum peak-gene distance to 10,000 bp here, while the default is 250,000 bp.</p>
691
692
693
694
695
696
697
<!-- TODO -->
<!-- Type of overlap for gene: Either "TSS" or "full". If "full", any extended peak-gene overlap is taken, regardless of where in the gene it occurs -->
<!-- If set to "TSS", only overlap of extended peaks with the TSS of the gene (assumed to be at the 5' position) is considered -->
<!-- Until 09.09.20, "full" was being used by default, this parameter did not exist before -->
<!-- Only relevant when no TAD domains are provided; if TADs are provided, this parameter can be ignored. -->
<!-- Specifies the neighborhood size in bp (for both upstream and downstream of the peak) for peaks to find genes in vicinity and associate/correlate genes with peaks -->
<!-- Default value 250000, here set to a smaller value to decrease running time -->
698
<p>For more parameter details, see the R help (<code><a href="../reference/addConnections_peak_gene.html">?addConnections_peak_gene</a></code>).</p>
Christian Arnold's avatar
Christian Arnold committed
699
<div class="sourceCode" id="cb37"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
700
<code class="sourceCode R"><span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/addConnections_peak_gene.html">addConnections_peak_gene</a></span><span class="op">(</span><span class="va">GRN</span>, overlapTypeGene <span class="op">=</span> <span class="st">"TSS"</span>, corMethod <span class="op">=</span> <span class="st">"pearson"</span>,
701
    promoterRange <span class="op">=</span> <span class="fl">10000</span>, TADs <span class="op">=</span> <span class="cn">NULL</span>, nCores <span class="op">=</span> <span class="fl">1</span>, plotDiagnosticPlots <span class="op">=</span> <span class="cn">TRUE</span>, plotGeneTypes <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/list.html" class="external-link">list</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="st">"all"</span><span class="op">)</span><span class="op">)</span>,
Christian Arnold's avatar
Christian Arnold committed
702
    forceRerun <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
703
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:20:38] </span>
704
705
<span class="co">## Real data</span>
<span class="co">## </span>
Christian Arnold's avatar
Christian Arnold committed
706
707
708
709
710
711
712
<span class="co">## INFO [2022-01-21 13:20:38] Calculate peak-gene correlations for neighborhood size 10000</span>
<span class="co">## INFO [2022-01-21 13:20:38] Calculate peak gene overlaps...</span>
<span class="co">## INFO [2022-01-21 13:20:38] Extend peaks based on user-defined extension size of 10000 up- and downstream.</span>
<span class="co">## INFO [2022-01-21 13:20:38] Reading pre-compiled genome annotation data </span>
<span class="co">## INFO [2022-01-21 13:20:38]  Finished successfully. Execution time: 0.5 secs</span>
<span class="co">## INFO [2022-01-21 13:20:38]  Iterate through 41912 peak-gene combinations and (if possible) calculate correlations using 1 cores. This may take a few minutes.</span>
<span class="co">## INFO [2022-01-21 13:20:47]  Finished execution using 1 cores. TOTAL RUNNING TIME: 8.8 secs</span>
713
<span class="co">## </span>
Christian Arnold's avatar
Christian Arnold committed
714
715
716
717
718
<span class="co">## INFO [2022-01-21 13:20:47]  Finished with calculating correlations, creating final data frame and filter NA rows due to missing RNA-seq data</span>
<span class="co">## INFO [2022-01-21 13:20:47]  Initial number of rows: 41912</span>
<span class="co">## INFO [2022-01-21 13:20:47]  Finished. Final number of rows: 18804</span>
<span class="co">## INFO [2022-01-21 13:20:47]  Finished successfully. Execution time: 9.6 secs</span>
<span class="co">## INFO [2022-01-21 13:20:47] </span>
719
720
<span class="co">## Permuted data</span>
<span class="co">## </span>
Christian Arnold's avatar
Christian Arnold committed
721
722
723
724
725
726
727
728
<span class="co">## INFO [2022-01-21 13:20:47] Calculate random peak-gene correlations for neighborhood size 10000</span>
<span class="co">## INFO [2022-01-21 13:20:47] Calculate peak gene overlaps...</span>
<span class="co">## INFO [2022-01-21 13:20:47] Extend peaks based on user-defined extension size of 10000 up- and downstream.</span>
<span class="co">## INFO [2022-01-21 13:20:48] Reading pre-compiled genome annotation data </span>
<span class="co">## INFO [2022-01-21 13:20:48]  Finished successfully. Execution time: 0.4 secs</span>
<span class="co">## INFO [2022-01-21 13:20:48]  Randomize gene-peak links by shuffling the peak IDs.</span>
<span class="co">## INFO [2022-01-21 13:20:48]  Iterate through 41912 peak-gene combinations and (if possible) calculate correlations using 1 cores. This may take a few minutes.</span>
<span class="co">## INFO [2022-01-21 13:20:57]  Finished execution using 1 cores. TOTAL RUNNING TIME: 9 secs</span>
729
<span class="co">## </span>
Christian Arnold's avatar
Christian Arnold committed
730
731
732
733
734
735
736
<span class="co">## INFO [2022-01-21 13:20:57]  Finished with calculating correlations, creating final data frame and filter NA rows due to missing RNA-seq data</span>
<span class="co">## INFO [2022-01-21 13:20:57]  Initial number of rows: 41912</span>
<span class="co">## INFO [2022-01-21 13:20:57]  Finished. Final number of rows: 18804</span>
<span class="co">## INFO [2022-01-21 13:20:57]  Finished successfully. Execution time: 9.9 secs</span>
<span class="co">## INFO [2022-01-21 13:20:57] Plotting diagnostic plots for peak-gene correlations to file(s) with basename /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/peakGene_diagnosticPlots_</span>
<span class="co">## INFO [2022-01-21 13:20:58]  Gene type all</span></code></pre>
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:21:13]  Finished successfully. Execution time: 15.6 secs</span></code></pre>
737
738
<p>We see from the output that almost 42,000 peak-gene links have been identified that match our parameters (here: a maximum peak-gene distance of 10 kb). From these 42.000, however, only around 18,804 actually had corresponding RNA-seq data available, while RNA-seq data was missing or has been filtered for the other. This is a rather typical case, as not all known and annotated genes are included in the RNA-seq data in the first place. Similar to before, the correlations have also been performed for the permuted data.</p>
</div>
739
740
741
<div class="section level3">
<h3 id="quality-control-3-diagnostic-plots-for-peak-gene-connections">Quality control 3: Diagnostic plots for peak-gene connections<a class="anchor" aria-label="anchor" href="#quality-control-3-diagnostic-plots-for-peak-gene-connections"></a>
</h3>
742
743
<p>Let’s now check some diagnostic plots for the peak-gene connections. In analogy to the other diagnostic plots that we encountered already before, we describe their interpretation and meaning in detail in the Introductory vignette.</p>
</div>
744
745
746
747
<div class="section level3">
<h3 id="combine-tf-peak-and-peak-gene-connections-and-filter">Combine TF-peak and peak-gene connections and filter<a class="anchor" aria-label="anchor" href="#combine-tf-peak-and-peak-gene-connections-and-filter"></a>
</h3>
<p>Now that we added both TF-peaks and peak-gene links to our <em>GRaNIE</em> object, we are ready to filter and combine them. So far, they are stored separately in the object for various reasons (see the Introductory Vignette for details), but ultimately, we aim for combining them to derive TF-peak-gene connections. To do so, we can simply run the <code><a href="../reference/filterGRNAndConnectGenes.html">filterGRNAndConnectGenes()</a></code> function and filter the individual TF-peak and peak-gene links to our liking. The function has many more arguments, and we only specify a few in the example below. As before, we get a <em>GRaNIE</em> object back that now contains the merged and filtered TF-peak-gene connections that we can later extract. Some of the filters apply to the TF-peak links, some of them to the peak-gene links, the parameter name is intended to indicate that.</p>
Christian Arnold's avatar
Christian Arnold committed
748
<div class="sourceCode" id="cb40"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
749
<code class="sourceCode R"><span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/filterGRNAndConnectGenes.html">filterGRNAndConnectGenes</a></span><span class="op">(</span><span class="va">GRN</span>, TF_peak.fdr.threshold <span class="op">=</span> <span class="fl">0.2</span>, peak_gene.fdr.threshold <span class="op">=</span> <span class="fl">0.2</span>,
750
    peak_gene.fdr.method <span class="op">=</span> <span class="st">"BH"</span>, gene.types <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="st">"protein_coding"</span>, <span class="st">"lincRNA"</span><span class="op">)</span>, allowMissingTFs <span class="op">=</span> <span class="cn">FALSE</span>,
Christian Arnold's avatar
Christian Arnold committed
751
    allowMissingGenes <span class="op">=</span> <span class="cn">FALSE</span><span class="op">)</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
752
753
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:21:13] Filter GRN network</span>
<span class="co">## INFO [2022-01-21 13:21:13] </span>
754
755
<span class="co">## </span>
<span class="co">## Real data</span>
Christian Arnold's avatar
Christian Arnold committed
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
<span class="co">## INFO [2022-01-21 13:21:13] Inital number of rows left before all filtering steps: 23096</span>
<span class="co">## INFO [2022-01-21 13:21:13]  Filter network and retain only rows with TF-peak connections with an FDR &lt; 0.2</span>
<span class="co">## INFO [2022-01-21 13:21:13]   Number of TF-peak rows before filtering TFs: 23096</span>
<span class="co">## INFO [2022-01-21 13:21:13]   Number of TF-peak rows after filtering TFs: 4907</span>
<span class="co">## INFO [2022-01-21 13:21:13] 2. Filter peak-gene connections</span>
<span class="co">## INFO [2022-01-21 13:21:13]  Filter genes by gene type, keep only the following gene types: protein_coding, lincRNA</span>
<span class="co">## INFO [2022-01-21 13:21:13]   Number of peak-gene rows before filtering by gene type: 18828</span>
<span class="co">## INFO [2022-01-21 13:21:13]   Number of peak-gene rows after filtering by gene type: 18734</span>
<span class="co">## INFO [2022-01-21 13:21:13] 3. Merging TF-peak with peak-gene connections and filter the combined table...</span>
<span class="co">## INFO [2022-01-21 13:21:13] Inital number of rows left before all filtering steps: 5955</span>
<span class="co">## INFO [2022-01-21 13:21:13]  Filter rows with missing ENSEMBL IDs</span>
<span class="co">## INFO [2022-01-21 13:21:13]   Number of rows before filtering: 5955</span>
<span class="co">## INFO [2022-01-21 13:21:13]   Number of rows after filtering: 4002</span>
<span class="co">## INFO [2022-01-21 13:21:13]  Filter network and retain only rows with peak_gene.r in the following interval: (0 - 1]</span>
<span class="co">## INFO [2022-01-21 13:21:13]   Number of rows before filtering TFs: 4002</span>
<span class="co">## INFO [2022-01-21 13:21:13]   Number of rows after filtering TFs: 2364</span>
<span class="co">## INFO [2022-01-21 13:21:13]  Calculate FDR based on remaining rows, filter network and retain only rows with peak-gene connections with an FDR &lt; 0.2</span>
<span class="co">## INFO [2022-01-21 13:21:13]   Number of rows before filtering genes (including NA): 2364</span>
<span class="co">## INFO [2022-01-21 13:21:13]   Number of rows before filtering genes (excluding NA): 2364</span>
<span class="co">## INFO [2022-01-21 13:21:13]   Number of rows after filtering genes (including NA): 626</span>
<span class="co">## INFO [2022-01-21 13:21:14]   Number of rows after filtering genes (excluding NA): 626</span>
<span class="co">## INFO [2022-01-21 13:21:14] Final number of rows left after all filtering steps: 626</span>
<span class="co">## INFO [2022-01-21 13:21:14]  Finished successfully. Execution time: 0.6 secs</span>
<span class="co">## INFO [2022-01-21 13:21:14] </span>
780
781
<span class="co">## </span>
<span class="co">## Permuted data</span>
Christian Arnold's avatar
Christian Arnold committed
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
<span class="co">## INFO [2022-01-21 13:21:14] Inital number of rows left before all filtering steps: 62</span>
<span class="co">## INFO [2022-01-21 13:21:14]  Filter network and retain only rows with TF-peak connections with an FDR &lt; 0.2</span>
<span class="co">## INFO [2022-01-21 13:21:14]   Number of TF-peak rows before filtering TFs: 62</span>
<span class="co">## INFO [2022-01-21 13:21:14]   Number of TF-peak rows after filtering TFs: 24</span>
<span class="co">## INFO [2022-01-21 13:21:14] 2. Filter peak-gene connections</span>
<span class="co">## INFO [2022-01-21 13:21:14]  Filter genes by gene type, keep only the following gene types: protein_coding, lincRNA</span>
<span class="co">## INFO [2022-01-21 13:21:14]   Number of peak-gene rows before filtering by gene type: 18828</span>
<span class="co">## INFO [2022-01-21 13:21:14]   Number of peak-gene rows after filtering by gene type: 18734</span>
<span class="co">## INFO [2022-01-21 13:21:14] 3. Merging TF-peak with peak-gene connections and filter the combined table...</span>
<span class="co">## INFO [2022-01-21 13:21:14] Inital number of rows left before all filtering steps: 26</span>
<span class="co">## INFO [2022-01-21 13:21:14]  Filter rows with missing ENSEMBL IDs</span>
<span class="co">## INFO [2022-01-21 13:21:14]   Number of rows before filtering: 26</span>
<span class="co">## INFO [2022-01-21 13:21:14]   Number of rows after filtering: 8</span>
<span class="co">## INFO [2022-01-21 13:21:14]  Filter network and retain only rows with peak_gene.r in the following interval: (0 - 1]</span>
<span class="co">## INFO [2022-01-21 13:21:14]   Number of rows before filtering TFs: 8</span>
<span class="co">## INFO [2022-01-21 13:21:14]   Number of rows after filtering TFs: 3</span>
<span class="co">## INFO [2022-01-21 13:21:14]  Calculate FDR based on remaining rows, filter network and retain only rows with peak-gene connections with an FDR &lt; 0.2</span>
<span class="co">## INFO [2022-01-21 13:21:14]   Number of rows before filtering genes (including NA): 3</span>
<span class="co">## INFO [2022-01-21 13:21:14]   Number of rows before filtering genes (excluding NA): 3</span>
<span class="co">## INFO [2022-01-21 13:21:14]   Number of rows after filtering genes (including NA): 2</span>
<span class="co">## INFO [2022-01-21 13:21:14]   Number of rows after filtering genes (excluding NA): 2</span>
<span class="co">## INFO [2022-01-21 13:21:14] Final number of rows left after all filtering steps: 2</span>
<span class="co">## INFO [2022-01-21 13:21:14]  Finished successfully. Execution time: 1.1 secs</span></code></pre>
Christian Arnold's avatar
Christian Arnold committed
805
<p>The output shows the number of links before and after applying a particular filter that has been set for both real and permuted data. As expected and reassuringly, almost no connections remain for the permuted data, while the real data keeps around 2500 connections.</p>
806
807
<p>For more parameter details, see the R help (<code><a href="../reference/filterGRNAndConnectGenes.html">?filterGRNAndConnectGenes</a></code>).</p>
</div>
808
809
810
811
<div class="section level3">
<h3 id="add-tf-gene-correlations-optional">Add TF-gene correlations (optional)<a class="anchor" aria-label="anchor" href="#add-tf-gene-correlations-optional"></a>
</h3>
<p>Optionally, we can also include extra columns about the correlation of TF and genes directly. So far, only TF-peaks and peak-genes have been correlated, but not directly TFs and genes. Based on a filtered set of TF-peak-gene connections, the function <code><a href="../reference/add_TF_gene_correlation.html">add_TF_gene_correlation()</a></code> calculates the TF-gene correlation for each connection from the filtered set for which the TF is not missing.</p>
Christian Arnold's avatar
Christian Arnold committed
812
<div class="sourceCode" id="cb42"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
813
<code class="sourceCode R"><span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/add_TF_gene_correlation.html">add_TF_gene_correlation</a></span><span class="op">(</span><span class="va">GRN</span>, corMethod <span class="op">=</span> <span class="st">"pearson"</span>, nCores <span class="op">=</span> <span class="fl">1</span>, forceRerun <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
814
815
816
817
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:21:14] Calculate correlations for TF and genes from the filtered set of connections</span>
<span class="co">## INFO [2022-01-21 13:21:14]  Real data</span>
<span class="co">## INFO [2022-01-21 13:21:14]   Iterate through 582 TF-gene combinations and (if possible) calculate correlations using 1 cores. This may take a few minutes.</span>
<span class="co">## INFO [2022-01-21 13:21:16]  Finished execution using 1 cores. TOTAL RUNNING TIME: 1.4 secs</span>
818
<span class="co">## </span>
Christian Arnold's avatar
Christian Arnold committed
819
820
821
822
823
824
<span class="co">## INFO [2022-01-21 13:21:16]   Done. Construct the final table, this may result in an increased number of TF-gene pairs due to different TF names linked to the same Ensembl ID.</span>
<span class="co">## INFO [2022-01-21 13:21:16]  Permuted data</span>
<span class="co">## INFO [2022-01-21 13:21:16]   Iterate through 2 TF-gene combinations and (if possible) calculate correlations using 1 cores. This may take a few minutes.</span>
<span class="co">## INFO [2022-01-21 13:21:17]  Finished execution using 1 cores. TOTAL RUNNING TIME: 1 secs</span>
<span class="co">## </span>
<span class="co">## INFO [2022-01-21 13:21:17]   Done. Construct the final table, this may result in an increased number of TF-gene pairs due to different TF names linked to the same Ensembl ID.</span></code></pre>
825
826
827
<p>As can be seen from the output, the Pearson correlation for 587 TF-gene pairs has been calculated. From the around 2500 connections we obtained above, since we set the parameter <em>allowMissingGenes = TRUE</em>, for the majority of the TF-peak-gene connections the gene is actually missing. That is, while a TF-peak connection below the specified significance threshold exists, no corresponding gene could be found that connects to the same peak, therefore setting the gene to <em>NA</em> rather than excluding the row altogether.</p>
<p>For more parameter details, see the R help (<code><a href="../reference/add_TF_gene_correlation.html">?add_TF_gene_correlation</a></code>).</p>
</div>
828
829
830
831
<div class="section level3">
<h3 id="retrieve-filtered-connections">Retrieve filtered connections<a class="anchor" aria-label="anchor" href="#retrieve-filtered-connections"></a>
</h3>
<p>We are now ready to retrieve the connections and the additional data we added to them. This can be done with the helper function <code><a href="../reference/getGRNConnections.html">getGRNConnections()</a></code> that retrieves a data frame from a <em>GRaNIE</em> object from a particular slot. Here, we specify <em>all.filtered</em>, as we want to retrieve all filtered connections. For more parameter details, see the R help (<code>getGRNConnections</code>). Note that the first time, we assign a different variable to the return of the function (i.e., <em>GRN_connections.all</em> and NOT <em>GRaNIE</em> as before). Importantly, we have to select a new variable as we would otherwise overwrite our <em>GRaNIE</em> object altogether! All <em>get</em> functions from the <em>GRaNIE</em> package return an element from within the object and NOT the object itself, so please keep that in mind and always check what the functions returns before running it. You can simply do so in the R help (<code><a href="../reference/getGRNConnections.html">?getGRNConnections</a></code>).</p>
Christian Arnold's avatar
Christian Arnold committed
832
<div class="sourceCode" id="cb44"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
833
<code class="sourceCode R"><span class="va">GRN_connections.all</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/getGRNConnections.html">getGRNConnections</a></span><span class="op">(</span><span class="va">GRN</span>, type <span class="op">=</span> <span class="st">"all.filtered"</span>, include_TF_gene_correlations <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span>
834
835

<span class="va">GRN_connections.all</span></code></pre></div>
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
<pre class="scroll-200"><code><span class="co">## <span style="color: #949494;"># A tibble: 626 × 32</span></span>
<span class="co">##    TF.name   TF.ENSEMBL     TF_peak.r_bin TF_peak.r TF_peak.fdr TF_peak.fdr_orig</span>
<span class="co">##    <span style="color: #949494; font-style: italic;">&lt;chr&gt;</span>     <span style="color: #949494; font-style: italic;">&lt;fct&gt;</span>          <span style="color: #949494; font-style: italic;">&lt;fct&gt;</span>             <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>       <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span>            <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span></span>
<span class="co">## <span style="color: #BCBCBC;"> 1</span> BATF3.0.B ENSG000001236… [0.65,0.7)        0.684       0.185            0.185</span>
<span class="co">## <span style="color: #BCBCBC;"> 2</span> E2F6.0.A  ENSG000001690… [0.55,0.6)        0.550       0.156            0.156</span>
<span class="co">## <span style="color: #BCBCBC;"> 3</span> E2F6.0.A  ENSG000001690… [0.5,0.55)        0.514       0.175            0.175</span>
<span class="co">## <span style="color: #BCBCBC;"> 4</span> E2F6.0.A  ENSG000001690… [0.5,0.55)        0.539       0.175            0.175</span>
<span class="co">## <span style="color: #BCBCBC;"> 5</span> E2F6.0.A  ENSG000001690… [0.5,0.55)        0.539       0.175            0.175</span>
<span class="co">## <span style="color: #BCBCBC;"> 6</span> E2F6.0.A  ENSG000001690… [0.45,0.5)        0.494       0.191            0.191</span>
<span class="co">## <span style="color: #BCBCBC;"> 7</span> E2F6.0.A  ENSG000001690… [0.55,0.6)        0.585       0.156            0.156</span>
<span class="co">## <span style="color: #BCBCBC;"> 8</span> E2F6.0.A  ENSG000001690… [0.5,0.55)        0.501       0.175            0.175</span>
<span class="co">## <span style="color: #BCBCBC;"> 9</span> E2F6.0.A  ENSG000001690… [0.65,0.7)        0.663       0.166            0.166</span>
<span class="co">## <span style="color: #BCBCBC;">10</span> E2F6.0.A  ENSG000001690… [0.5,0.55)        0.528       0.175            0.175</span>
<span class="co">## <span style="color: #949494;"># … with 616 more rows, and 26 more variables: TF_peak.fdr_direction &lt;fct&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   TF_peak.connectionType &lt;fct&gt;, peak.ID &lt;fct&gt;, peak.mean &lt;dbl&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   peak.median &lt;dbl&gt;, peak.CV &lt;dbl&gt;, peak.annotation &lt;fct&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   peak.GC.perc &lt;dbl&gt;, peak.width &lt;int&gt;, peak.GC.class &lt;ord&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   peak_gene.distance &lt;int&gt;, peak_gene.r &lt;dbl&gt;, peak_gene.p_raw &lt;dbl&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   peak_gene.p_adj &lt;dbl&gt;, gene.ENSEMBL &lt;fct&gt;, gene.mean &lt;dbl&gt;,</span></span>
<span class="co">## <span style="color: #949494;">#   gene.median &lt;dbl&gt;, gene.CV &lt;dbl&gt;, gene.chr &lt;fct&gt;, gene.start &lt;int&gt;, …</span></span></code></pre>
<p>The table contains a total of 28 columns, and the prefix of each column name indicates the part of the <em>eGRN</em> network that the column refers to (e.g., TFs, TF-peaks, peaks, peak-genes or genes, or TF-gene if the function <code><a href="../reference/add_TF_gene_correlation.html">add_TF_gene_correlation()</a></code> has been run before). Data are stored in a format that minimizes the memory footprint (e.g., each character column is stored as a factor). This table can now be used for any downstream analysis, as it is just a normal data frame.</p>
857
</div>
858
859
860
<div class="section level3">
<h3 id="visualize-the-filtered-egrn-connections">Visualize the filtered <em>eGRN</em> connections<a class="anchor" aria-label="anchor" href="#visualize-the-filtered-egrn-connections"></a>
</h3>
Christian Arnold's avatar
Christian Arnold committed
861
<p>The <em>GRaNIE</em> package will soon also offer some rudimentary functions to visualize a filtered <em>eGRN</em> network. Stay tuned! Meanwhile, you can use the <em>igraph</em> package to construct a graph out of the filtered TF-peak-gene connection table (see above).</p>
862
</div>
863
864
865
866
<div class="section level3">
<h3 id="generate-a-connection-summary">Generate a connection summary<a class="anchor" aria-label="anchor" href="#generate-a-connection-summary"></a>
</h3>
<p>It is often useful to get a grasp of the general connectivity of a network and the number of connections that survive the filtering. This makes it possible to make an informed decision about which FDR to choose for TF-peak and peak-gene links, depending on how many links are retained and how many connections are needed for downstream analysis. To facilitate this and automate it, we offer the convenience function <code><a href="../reference/generateStatsSummary.html">generateStatsSummary()</a></code> that in essence iterates over different combinations of filtering parameters and calls the function <code><a href="../reference/filterGRNAndConnectGenes.html">filterGRNAndConnectGenes()</a></code> once for each of them, and then records various connectivity statistics, and finally plots it by calling the function <code><a href="../reference/plot_stats_connectionSummary.html">plot_stats_connectionSummary()</a></code>. Note that running this function may take a while. Afterwards, we can graphically summarize this result in either a heatmap or a boxplot. For more parameter details, see the R help (<code><a href="../reference/generateStatsSummary.html">?generateStatsSummary</a></code> and <code>plot_stats_connectionSummary</code>).</p>
Christian Arnold's avatar
Christian Arnold committed
867
<div class="sourceCode" id="cb46"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
868
869
870
871
<code class="sourceCode R"><span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/generateStatsSummary.html">generateStatsSummary</a></span><span class="op">(</span><span class="va">GRN</span>, TF_peak.fdr <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="fl">0.01</span>, <span class="fl">0.05</span>, <span class="fl">0.1</span>, <span class="fl">0.2</span><span class="op">)</span>, TF_peak.connectionTypes <span class="op">=</span> <span class="st">"all"</span>,
    peak_gene.p_raw <span class="op">=</span> <span class="cn">NULL</span>, peak_gene.fdr <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="fl">0.01</span>, <span class="fl">0.05</span>, <span class="fl">0.1</span>, <span class="fl">0.2</span><span class="op">)</span>, peak_gene.r_range <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="fl">0</span>,
        <span class="fl">1</span><span class="op">)</span>, allowMissingGenes <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="cn">FALSE</span>, <span class="cn">TRUE</span><span class="op">)</span>, allowMissingTFs <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="cn">FALSE</span><span class="op">)</span>, gene.types <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="st">"protein_coding"</span>,
        <span class="st">"lincRNA"</span><span class="op">)</span><span class="op">)</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
872
873
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:21:17] Generating summary. This may take a while...</span>
<span class="co">## INFO [2022-01-21 13:21:17] </span>
874
875
<span class="co">## Real data...</span>
<span class="co">## </span>
Christian Arnold's avatar
Christian Arnold committed
876
877
878
879
880
<span class="co">## INFO [2022-01-21 13:21:17] Calculate network stats for TF-peak FDR of 0.01</span>
<span class="co">## INFO [2022-01-21 13:21:24] Calculate network stats for TF-peak FDR of 0.05</span>
<span class="co">## INFO [2022-01-21 13:21:31] Calculate network stats for TF-peak FDR of 0.1</span>
<span class="co">## INFO [2022-01-21 13:21:39] Calculate network stats for TF-peak FDR of 0.2</span>
<span class="co">## INFO [2022-01-21 13:21:46] </span>
881
882
<span class="co">## Permuted data...</span>
<span class="co">## </span>
Christian Arnold's avatar
Christian Arnold committed
883
884
885
886
887
<span class="co">## INFO [2022-01-21 13:21:46] Calculate network stats for TF-peak FDR of 0.01</span>
<span class="co">## INFO [2022-01-21 13:21:53] Calculate network stats for TF-peak FDR of 0.05</span>
<span class="co">## INFO [2022-01-21 13:22:00] Calculate network stats for TF-peak FDR of 0.1</span>
<span class="co">## INFO [2022-01-21 13:22:07] Calculate network stats for TF-peak FDR of 0.2</span></code></pre>
<div class="sourceCode" id="cb48"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
888
<code class="sourceCode R"><span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/plot_stats_connectionSummary.html">plot_stats_connectionSummary</a></span><span class="op">(</span><span class="va">GRN</span>, type <span class="op">=</span> <span class="st">"heatmap"</span>, forceRerun <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
889
890
891
892
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:22:14] Plotting connection summary to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/GRN.connectionSummary_heatmap.pdf</span></code></pre>
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:22:15] Finished writing plots to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/GRN.connectionSummary_heatmap.pdf</span>
<span class="co">## INFO [2022-01-21 13:22:15]  Finished successfully. Execution time: 0.5 secs</span></code></pre>
<div class="sourceCode" id="cb51"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
893
<code class="sourceCode R"><span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/plot_stats_connectionSummary.html">plot_stats_connectionSummary</a></span><span class="op">(</span><span class="va">GRN</span>, type <span class="op">=</span> <span class="st">"boxplot"</span>, forceRerun <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
894
895
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:22:15] Plotting diagnostic plots for network connections to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/GRN.connectionSummary_boxplot.pdf</span></code></pre>
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:22:23]  Finished successfully. Execution time: 7.9 secs</span></code></pre>
896
897
<p>The output is not very informative here and just tells us about the current progress and parameter it iterates over. We can now check the two new PDF files that have been created! Please see the Introductory Vignette for examples and interpretation.</p>
</div>
898
899
900
901
<div class="section level3">
<h3 id="enrichment-analyses">Enrichment analyses<a class="anchor" aria-label="anchor" href="#enrichment-analyses"></a>
</h3>
<p>Lastly, our framework also supports various types of enrichment analyses that are fully integrated into the package. We offer these for the full network as well as per community. The latter can be calculated v For both the general and the community statistics and enrichment, the package can:</p>
Christian Arnold's avatar
Christian Arnold committed
902
<ul>
903
904
<li>calculate and plot general structure and connectivity statistics for a filtered <em>eGRN</em> (function <code><a href="../reference/plotGeneralGraphStats.html">plotGeneralGraphStats()</a></code>) and per community (functions <code><a href="../reference/calculateCommunitiesStats.html">calculateCommunitiesStats()</a></code> and <code><a href="../reference/plotCommunitiesStats.html">plotCommunitiesStats()</a></code>) ,</li>
<li>ontology enrichment and visualization for genes for the full network (functions <code><a href="../reference/calculateGeneralEnrichment.html">calculateGeneralEnrichment()</a></code> and <code><a href="../reference/plotGeneralEnrichment.html">plotGeneralEnrichment()</a></code>) as well as per community (functions <code><a href="../reference/calculateCommunitiesEnrichment.html">calculateCommunitiesEnrichment()</a></code> and <code><a href="../reference/plotCommunitiesEnrichment.html">plotCommunitiesEnrichment()</a></code>)</li>
Christian Arnold's avatar
Christian Arnold committed
905
</ul>
906
<p>All functions can be called individually, adjusted flexibly and the data is stored in the <em>GRaNIE</em> object for ultimate flexibility. In the near future, we plan to expand this set of functionality to additional enrichment analyses such as other databases (specific diseases pathways etc), so stay tuned! <code><a href="../reference/calculateCommunitiesStats.html">calculateCommunitiesStats()</a></code> For user convenience, all aforementioned functions can be called at once via a designated wrapper function <code><a href="../reference/performAllNetworkAnalyses.html">performAllNetworkAnalyses()</a></code>.</p>
Christian Arnold's avatar
Christian Arnold committed
907
<div class="sourceCode" id="cb54"><pre class="downlit sourceCode r">
Christian Arnold's avatar
Christian Arnold committed
908
<code class="sourceCode R"><span class="va">GRN</span> <span class="op">=</span> <span class="fu">GRaNIE</span><span class="fu">::</span><span class="fu"><a href="../reference/performAllNetworkAnalyses.html">performAllNetworkAnalyses</a></span><span class="op">(</span><span class="va">GRN</span>, forceRerun <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></code></pre></div>
Christian Arnold's avatar
Christian Arnold committed
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:22:23] Plotting general network statistics to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/GRN.overall_stats.pdf</span></code></pre>
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:22:25]  Finished successfully. Execution time: 2.3 secs</span>
<span class="co">## INFO [2022-01-21 13:22:25] Calculating general enrichment statistics... This may take a while.</span>
<span class="co">## INFO [2022-01-21 13:23:16]  Enrichment calculation finished for ontology BP. Checked 7116 terms</span>
<span class="co">## INFO [2022-01-21 13:23:16]   Number of terms for which p-value &lt;= 0.01: 18</span>
<span class="co">## INFO [2022-01-21 13:23:16]   Number of terms for which p-value &lt;= 0.05: 79</span>
<span class="co">## INFO [2022-01-21 13:23:16]   Number of terms for which p-value &lt;= 0.1: 159</span>
<span class="co">## INFO [2022-01-21 13:23:16]   Number of terms for which p-value &lt;= 0.2: 406</span>
<span class="co">## INFO [2022-01-21 13:23:22]  Enrichment calculation finished for ontology MF. Checked 1354 terms</span>
<span class="co">## INFO [2022-01-21 13:23:22]   Number of terms for which p-value &lt;= 0.01: 6</span>
<span class="co">## INFO [2022-01-21 13:23:22]   Number of terms for which p-value &lt;= 0.05: 27</span>
<span class="co">## INFO [2022-01-21 13:23:22]   Number of terms for which p-value &lt;= 0.1: 35</span>
<span class="co">## INFO [2022-01-21 13:23:22]   Number of terms for which p-value &lt;= 0.2: 90</span>
<span class="co">## INFO [2022-01-21 13:23:22] Results stored in GRN@stats[["Enrichment"]][["general"]]</span>
<span class="co">## INFO [2022-01-21 13:23:22] Finished successfully. Execution time: 57 secs</span>
<span class="co">## INFO [2022-01-21 13:23:22] Plotting general enrichment results to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/GRN.overall_enrichment.pdf</span></code></pre>
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:23:23]  Finished successfully. Execution time: 0.6 secs</span>
<span class="co">## INFO [2022-01-21 13:23:23] Calculating communities for clustering type louvain...</span>
<span class="co">## INFO [2022-01-21 13:23:23]  Finished successfully. Execution time: 0.2 secs</span>
<span class="co">## INFO [2022-01-21 13:23:23] Plotting community statistics to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/GRN.community_stats.pdf</span></code></pre>
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:23:30]  Finished successfully. Execution time: 7.3 secs</span>
<span class="co">## INFO [2022-01-21 13:23:30] Running enrichment analysis for all communities. This may take a while...</span>
<span class="co">## INFO [2022-01-21 13:23:30]  Community 4</span>
<span class="co">## INFO [2022-01-21 13:24:19]  Enrichment calculation finished for ontology BP. Checked 7116 terms</span>
<span class="co">## INFO [2022-01-21 13:24:19]   Number of terms for which p-value &lt;= 0.01: 17</span>
<span class="co">## INFO [2022-01-21 13:24:19]   Number of terms for which p-value &lt;= 0.05: 41</span>
<span class="co">## INFO [2022-01-21 13:24:19]   Number of terms for which p-value &lt;= 0.1: 151</span>
<span class="co">## INFO [2022-01-21 13:24:19]   Number of terms for which p-value &lt;= 0.2: 357</span>
<span class="co">## INFO [2022-01-21 13:24:24]  Enrichment calculation finished for ontology MF. Checked 1354 terms</span>
<span class="co">## INFO [2022-01-21 13:24:24]   Number of terms for which p-value &lt;= 0.01: 1</span>
<span class="co">## INFO [2022-01-21 13:24:24]   Number of terms for which p-value &lt;= 0.05: 8</span>
<span class="co">## INFO [2022-01-21 13:24:24]   Number of terms for which p-value &lt;= 0.1: 45</span>
<span class="co">## INFO [2022-01-21 13:24:24]   Number of terms for which p-value &lt;= 0.2: 90</span>
<span class="co">## INFO [2022-01-21 13:24:24]  Community 2</span>
<span class="co">## INFO [2022-01-21 13:25:09]  Enrichment calculation finished for ontology BP. Checked 7116 terms</span>
<span class="co">## INFO [2022-01-21 13:25:10]   Number of terms for which p-value &lt;= 0.01: 2</span>
<span class="co">## INFO [2022-01-21 13:25:10]   Number of terms for which p-value &lt;= 0.05: 30</span>
<span class="co">## INFO [2022-01-21 13:25:10]   Number of terms for which p-value &lt;= 0.1: 124</span>
<span class="co">## INFO [2022-01-21 13:25:10]   Number of terms for which p-value &lt;= 0.2: 249</span>
<span class="co">## INFO [2022-01-21 13:25:15]  Enrichment calculation finished for ontology MF. Checked 1354 terms</span>
<span class="co">## INFO [2022-01-21 13:25:15]   Number of terms for which p-value &lt;= 0.01: 0</span>
<span class="co">## INFO [2022-01-21 13:25:15]   Number of terms for which p-value &lt;= 0.05: 2</span>
<span class="co">## INFO [2022-01-21 13:25:15]   Number of terms for which p-value &lt;= 0.1: 35</span>
<span class="co">## INFO [2022-01-21 13:25:15]   Number of terms for which p-value &lt;= 0.2: 71</span>
<span class="co">## INFO [2022-01-21 13:25:15]  Community 3</span>
<span class="co">## INFO [2022-01-21 13:25:59]  Enrichment calculation finished for ontology BP. Checked 7116 terms</span>
<span class="co">## INFO [2022-01-21 13:25:59]   Number of terms for which p-value &lt;= 0.01: 6</span>
<span class="co">## INFO [2022-01-21 13:25:59]   Number of terms for which p-value &lt;= 0.05: 79</span>
<span class="co">## INFO [2022-01-21 13:25:59]   Number of terms for which p-value &lt;= 0.1: 175</span>
<span class="co">## INFO [2022-01-21 13:25:59]   Number of terms for which p-value &lt;= 0.2: 266</span>
<span class="co">## INFO [2022-01-21 13:26:04]  Enrichment calculation finished for ontology MF. Checked 1354 terms</span>
<span class="co">## INFO [2022-01-21 13:26:04]   Number of terms for which p-value &lt;= 0.01: 3</span>
<span class="co">## INFO [2022-01-21 13:26:04]   Number of terms for which p-value &lt;= 0.05: 24</span>
<span class="co">## INFO [2022-01-21 13:26:04]   Number of terms for which p-value &lt;= 0.1: 49</span>
<span class="co">## INFO [2022-01-21 13:26:04]   Number of terms for which p-value &lt;= 0.2: 71</span>
<span class="co">## INFO [2022-01-21 13:26:04]  Community 1</span>
<span class="co">## INFO [2022-01-21 13:26:43]  Enrichment calculation finished for ontology BP. Checked 7116 terms</span>
<span class="co">## INFO [2022-01-21 13:26:44]   Number of terms for which p-value &lt;= 0.01: 6</span>
<span class="co">## INFO [2022-01-21 13:26:44]   Number of terms for which p-value &lt;= 0.05: 7</span>
<span class="co">## INFO [2022-01-21 13:26:44]   Number of terms for which p-value &lt;= 0.1: 7</span>
<span class="co">## INFO [2022-01-21 13:26:44]   Number of terms for which p-value &lt;= 0.2: 7</span>
<span class="co">## INFO [2022-01-21 13:26:48]  Enrichment calculation finished for ontology MF. Checked 1354 terms</span>
<span class="co">## INFO [2022-01-21 13:26:48]   Number of terms for which p-value &lt;= 0.01: 2</span>
<span class="co">## INFO [2022-01-21 13:26:48]   Number of terms for which p-value &lt;= 0.05: 3</span>
<span class="co">## INFO [2022-01-21 13:26:48]   Number of terms for which p-value &lt;= 0.1: 3</span>
<span class="co">## INFO [2022-01-21 13:26:48]   Number of terms for which p-value &lt;= 0.2: 3</span>
<span class="co">## INFO [2022-01-21 13:26:48]  Community 5</span></code></pre>
976
977
<pre><code><span class="co">## Warning in getSigGroups(object, test.stat): No enrichment can pe performed -</span>
<span class="co">## there are no feasible GO terms!</span></code></pre>
Christian Arnold's avatar
Christian Arnold committed
978
979
980
981
982
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:27:25]  Enrichment calculation finished for ontology BP. Checked 7116 terms</span>
<span class="co">## INFO [2022-01-21 13:27:25]   Number of terms for which p-value &lt;= 0.01: 0</span>
<span class="co">## INFO [2022-01-21 13:27:25]   Number of terms for which p-value &lt;= 0.05: 0</span>
<span class="co">## INFO [2022-01-21 13:27:25]   Number of terms for which p-value &lt;= 0.1: 0</span>
<span class="co">## INFO [2022-01-21 13:27:25]   Number of terms for which p-value &lt;= 0.2: 0</span></code></pre>
983
984
<pre><code><span class="co">## Warning in getSigGroups(object, test.stat): No enrichment can pe performed -</span>
<span class="co">## there are no feasible GO terms!</span></code></pre>
Christian Arnold's avatar
Christian Arnold committed
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<pre class="scroll-200"><code><span class="co">## INFO [2022-01-21 13:27:29]  Enrichment calculation finished for ontology MF. Checked 1354 terms</span>
<span class="co">## INFO [2022-01-21 13:27:29]   Number of terms for which p-value &lt;= 0.01: 0</span>
<span class="co">## INFO [2022-01-21 13:27:29]   Number of terms for which p-value &lt;= 0.05: 0</span>
<span class="co">## INFO [2022-01-21 13:27:29]   Number of terms for which p-value &lt;= 0.1: 0</span>
<span class="co">## INFO [2022-01-21 13:27:29]   Number of terms for which p-value &lt;= 0.2: 0</span>
<span class="co">## INFO [2022-01-21 13:27:29]  Community 6</span>
<span class="co">## INFO [2022-01-21 13:28:04]  Enrichment calculation finished for ontology BP. Checked 7116 terms</span>
<span class="co">## INFO [2022-01-21 13:28:04]   Number of terms for which p-value &lt;= 0.01: 5</span>
<span class="co">## INFO [2022-01-21 13:28:04]   Number of terms for which p-value &lt;= 0.05: 8</span>
<span class="co">## INFO [2022-01-21 13:28:04]   Number of terms for which p-value &lt;= 0.1: 8</span>
<span class="co">## INFO [2022-01-21 13:28:04]   Number of terms for which p-value &lt;= 0.2: 8</span>
<span class="co">## INFO [2022-01-21 13:28:08]  Enrichment calculation finished for ontology MF. Checked 1354 terms</span>
<span class="co">## INFO [2022-01-21 13:28:08]   Number of terms for which p-value &lt;= 0.01: 1</span>
<span class="co">## INFO [2022-01-21 13:28:08]   Number of terms for which p-value &lt;= 0.05: 2</span>
<span class="co">## INFO [2022-01-21 13:28:08]   Number of terms for which p-value &lt;= 0.1: 2</span>
<span class="co">## INFO [2022-01-21 13:28:09]   Number of terms for which p-value &lt;= 0.2: 3</span>