workflow.html 156 KB
 1 2 3 4 5 6 7   Christian Arnold committed Jan 21, 2022 8 Workflow example • GRaNIEdev  9 10 11 12 13 14   Christian Arnold committed Jan 21, 2022 15   16 17 18 19 20 21 22 23 24 25 26 27   Christian Arnold committed Jan 21, 2022 28 29   30 31 32 33 34 35 36 37 38 39 40 
 Christian Arnold committed Jan 21, 2022 89 
 90 91 92 

Abstract

 Christian Arnold committed Jan 21, 2022 105 

This workflow vignette shows how to use the GRaNIE package in a real-world example. For this purpose, you will use the GRaNIEData package for a more complex analysis to illustrate most of its features. Importantly, you will also learn in detail how to work with a GRaNIE object and what its main functions and properties are. The vignette will be continuously updated whenever new functionality becomes available or when we receive user feedback.

 106 107 
 Christian Arnold committed Jan 21, 2022 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 

Example Workflow

 141 

 Christian Arnold committed Jan 21, 2022 142 143 144 

In the following example, you will use data from the GRaNIEData package to construct a eGRN from ATAC-seq, RNA-seq data as well transcription factor data.

First, let’s load the required libraries GRaNIE and GRaNIEData. The tidyverse package is already loaded and attached when loading the GRaNIE package, but we nevertheless load it here explicitly to highlight that we’ll use various tidyverse functions for data import.

For reasons of brevity, we omit the output of this code chunk.

 145 

Christian Arnold committed Jan 21, 2022  146      147      148      149      150      151      152                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       library(tidyverse) library(GRaNIEData) library(GRaNIEdev)

General notes

Each of the GRaNIE functions we mention here in this Vignette comes with sensible default parameters that we found to work well for most of the datasets we tested it with so far. However, always check the validity and usefulness of the parameters before starting an analysis to avoid unreasonable results.

 153 
 Christian Arnold committed Jan 21, 2022 154 155 156 

Reading the data required for the GRaNIE package

 Christian Arnold committed Dec 14, 2021 157 

To set up a GRaNIE analysis, we first need to read in some data into R. The following data can be used for the GRaNIE package:

 158 159 160 161 162 163 164 165 166 167 168 
• open chromatin / peak data (from either ATAC-Seq, DNAse-Seq or ChIP-Seq data, for example), hereafter simply referred to as peaks
• RNA-Seq data (gene expression counts for genes across samples)

The following data can be used optionally but are not required:

• sample metadata (e.g., sex, gender, age, sequencing batch, etc)

So, let’s import the peak and RNA-seq data as a data frame as well as some sample metadata. This can be done in any way you want as long as you end up with the right format.

 Christian Arnold committed Jan 21, 2022 169 170 
(files = list.files(pattern = "*", system.file("extdata", package = "GRaNIEData"), 
Christian Arnold committed Dec 14, 2021  171                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 full.names = TRUE))
 Christian Arnold committed Jan 21, 2022 172 173 174 175 176 177 178 179 180 
file_peaks = files[grep("countsATAC.75k.tsv.gz", files)] file_RNA = files[grep("countsRNA.sampled.tsv.gz", files)] file_sampleMetadata = files[grep("metadata.sampled.tsv", files)] folder_TFBS_first50 = files[grep("TFBS_selected", files)] 
181                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
Christian Arnold committed Jan 21, 2022  182      183      184                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           countsRNA.df = read_tsv(file_RNA, col_types = cols()) countsPeaks.df = read_tsv(file_peaks, col_types = cols()) sampleMetadata.df = read_tsv(file_sampleMetadata, col_types = cols()) 
185      186      187                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            # Let's check how the data looks like countsRNA.df
 Christian Arnold committed Jan 21, 2022 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 
## # A tibble: 35,033 × 30 ##    ENSEMBL babk_D bima_D cicb_D coyi_D diku_D eipl_D eiwy_D eofe_D fafq_D febc_D ##    <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> ##  1 ENSG00…  48933  48737  60581  93101  84980  91536  85728  35483  69674  58890 ##  2 ENSG00…  49916  44086  50706  55893  57239  76418  75934  27926  57526  50491 ##  3 ENSG00… 281733 211703 269460 239116 284509 389989 351867 164615 257471 304203 ##  4 ENSG00…  98943  77503  92402  80927  96690 138149 115875  64368  91627 100039 ##  5 ENSG00…  14749  15571  16540  16383  16886  21892  18045  10026  14663  15830 ##  6 ENSG00…  64459  63734  71317  69612  72097 100487  78536  38572  65446  76910 ##  7 ENSG00…  57449  55736  70798  66334  66424  91801  94729  40413  56916  66382 ##  8 ENSG00…  15451  15570  15534  15945  10583  22601  16086   9275  16092  15291 ##  9 ENSG00…  18717  18757  20051  18066  19648  28572  25240  11258  17739  20347 ## 10 ENSG00… 168054 147822 178164 154220 168837 244731 215862  89368 158845 180734 ## # … with 35,023 more rows, and 19 more variables: fikt_D <dbl>, guss_D <dbl>, ## #   hayt_D <dbl>, hehd_D <dbl>, heja_D <dbl>, hiaf_D <dbl>, iill_D <dbl>, ## #   kuxp_D <dbl>, nukw_D <dbl>, oapg_D <dbl>, oevr_D <dbl>, pamv_D <dbl>, ## #   pelm_D <dbl>, podx_D <dbl>, qolg_D <dbl>, sojd_D <dbl>, vass_D <dbl>, ## #   xugn_D <dbl>, zaui_D <dbl>

207                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             countsPeaks.df
 Christian Arnold committed Jan 21, 2022 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 
## # A tibble: 75,000 × 32 ##    peakID  babk_D bima_D cicb_D coyi_D diku_D eipl_D eiwy_D eofe_D fafq_D febc_D ##    <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> ##  1 chr14:…      2      5      5      3      1      4      1      5      0     13 ##  2 chrX:1…      3      7     10      5      4      6      3     18      4     22 ##  3 chr15:…      5     28     38     11     20     19      7     53      5     22 ##  4 chr10:…      0     12      7      2      5      8      0     11      1     11 ##  5 chr12:…      5     14     18      5      3     13      5     15      2     25 ##  6 chr1:1…     12     21     36      6     20     29     12     44      2    105 ##  7 chr16:…      3     17     16      9      8     16      6     28      3     33 ##  8 chr17:…      4     11      6      3      0      3      2      9      1     14 ##  9 chr13:…     10     34     44     12     31     29      9     22      5     82 ## 10 chr1:2…     21    113     46     28     44     57     47    146     12     91 ## # … with 74,990 more rows, and 21 more variables: fikt_D <dbl>, guss_D <dbl>, ## #   hayt_D <dbl>, hehd_D <dbl>, heja_D <dbl>, hiaf_D <dbl>, iill_D <dbl>, ## #   kuxp_D <dbl>, nukw_D <dbl>, oapg_D <dbl>, oevr_D <dbl>, pamv_D <dbl>, ## #   pelm_D <dbl>, podx_D <dbl>, qolg_D <dbl>, sojd_D <dbl>, vass_D <dbl>, ## #   xugn_D <dbl>, zaui_D <dbl>, uaqe_D <dbl>, qaqx_D <dbl>

227                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             sampleMetadata.df
 Christian Arnold committed Jan 21, 2022 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 
## # A tibble: 31 × 16 ##    sample_id assigned assigned_frac atac_date  clone condition  diff_start donor ##    <chr>        <dbl>         <dbl> <date>     <dbl> <chr>      <date>     <chr> ##  1 babk_D     5507093         0.211 2015-12-04     2 IFNg_SL13… 2015-10-12 babk  ##  2 bima_D    23275756         0.677 2014-12-12     1 IFNg_SL13… 2014-11-07 bima  ##  3 cicb_D    19751751         0.580 2015-04-24     3 IFNg_SL13… 2015-03-30 cicb  ##  4 coyi_D     6733642         0.312 2015-11-05     3 IFNg_SL13… 2015-09-30 coyi  ##  5 diku_D     7010213         0.195 2015-11-13     1 IFNg_SL13… 2015-10-15 diku  ##  6 eipl_D    16923025         0.520 2015-08-04     1 IFNg_SL13… 2015-06-30 eipl  ##  7 eiwy_D     9807860         0.404 2015-12-02     1 IFNg_SL13… 2015-10-23 eiwy  ##  8 eofe_D    25687477         0.646 2014-12-12     1 IFNg_SL13… 2014-11-01 eofe  ##  9 fafq_D     4600004         0.415 2015-10-14     1 IFNg_SL13… 2015-09-16 fafq  ## 10 febc_D    31712153         0.430 2015-08-04     2 IFNg_SL13… 2015-07-06 febc  ## # … with 21 more rows, and 8 more variables: EB_formation <date>, ## #   macrophage_diff_days <dbl>, medium_changes <dbl>, mt_frac <dbl>, ## #   percent_duplication <dbl>, received_as <chr>, sex <chr>, ## #   short_long_ratio <dbl>

246      247                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    # Save the name of the respective ID columns idColumn_peaks = "peakID" 
Christian Arnold committed Dec 14, 2021  248      249      250      251      252      253                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                idColumn_RNA = "ENSEMBL"  # For the sake of simplicity, we only take a subset of all samples here to # speed-up the vignette code countsRNA.df = countsRNA.df[1:10000,1:10] # countsPeaks.df = countsPeaks.df[1:20000,1:10]

While we recommend raw counts for both peaks and RNA-Seq as input and offer several normalization choices in the pipeline, it is also possible to provide pre-normalized data. Note that the normalization method may have a large influence on the resulting eGRN network, so make sure the choice of normalization is reasonable. For more details, see the next sections.

 254 255 

As you can see, both peaks and RNA-Seq counts must have exactly one ID column, with all other columns being numeric. For peaks, this column may be called peakID, for example, but the exact name is not important and can be specified as a parameter later when adding the data to the object. The same applies for the RNA-Seq data, whereas a sensible choice here is ensemblID, for example.

For the peak ID column, the required format is “chr:start-end”, with chr denoting the chromosome, followed by “:”, and then start, “-”, and end for the peak start and end, respectively. As the coordinates for the peaks are needed in the pipeline, the format must be exactly as stated here.

 Christian Arnold committed Dec 14, 2021 256 

You may notice that the peaks and RNA-seq data have different samples being included, and not all are overlapping. This is not a problem and as long as some samples are found in both of them, the GRaNIE pipeline can work with it. Note that only the shared sampels between both data modalities are kept, however, so make sure that the sample names match between them and share as many samples as possible.

 257 
 Christian Arnold committed Jan 21, 2022 258 259 260 

Initialize a GRaNIE object

 Christian Arnold committed Dec 14, 2021 261 

We got all the data in the right format, we can start with our GRaNIE analysis now! We start by specifying some parameters such as the genome assembly version the data have been produced with, as well as some optional object metadata that helps us to distinguish this GRaNIE object from others.

 Christian Arnold committed Jan 21, 2022 262 263 
# Genome assembly shortcut. Either hg19, hg38 or mm10. Both peaks and RNA data 
264      265      266                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           # must have the same genome assembly genomeAssembly = "hg38"  
Christian Arnold committed Jan 21, 2022  267      268      269                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           # Optional and arbitrary list with information and metadata that is stored # within the GRaNIE object objectMetadata.l = list(name = paste0("Macrophages_infected_primed"), file_peaks = file_peaks, 
270      271      272      273                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      file_rna = file_RNA, file_sampleMetadata = file_sampleMetadata, genomeAssembly = genomeAssembly)  dir_output = "output"  
Christian Arnold committed Jan 21, 2022  274                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             GRN = GRaNIEdev::initializeGRN(objectMetadata = objectMetadata.l, outputFolder = dir_output, 
275                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 genomeAssembly = genomeAssembly)
 Christian Arnold committed Jan 21, 2022 276 277 
## INFO [2022-01-20 22:41:50] Empty GRN object created successfully. Type the object name (e.g., GRN) to retrieve summary information about it at any time.

278                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             GRN
 Christian Arnold committed Jan 21, 2022 279 280 281 282 283 284 285 286 287 288 289 290 291 292 
## Object of class: GRaNIEdev  ( version 0.14.4 ) ## Data summary: ##  Number of peaks: No peak data found. ##  Number of genes: No RNA-seq data found. ## Parameters: ## Provided metadata: ##   name :  Macrophages_infected_primed  ##   file_peaks :  /media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/countsATAC.75k.tsv.gz  ##   file_rna :  /media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/countsRNA.sampled.tsv.gz  ##   file_sampleMetadata :  /media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/metadata.sampled.tsv  ##   genomeAssembly :  hg38  ## Connections: ##  Number of genes (filtered, all):  NA ,  NA

Initializing a GRaNIE object occurs in the function initializeGRN() and is trivial: All we need to specify is an output folder (this is where all the pipeline output is automatically being saved unless specified otherwise) and the genome assembly shortcut of the data. We currently support hg19, hg38, and mm10. Please contact us if you need additional genomes. The metadata argument is recommended but optional and may contain an arbitrarily complex named list that is stored as additional metadata for the GRaNIE object. Here, we decided to specify a name for the GRaNIE object as well as the original paths for all 3 input files and the genome assembly.

 293 

For more parameter details, see the R help (?initializeGRN).

 Christian Arnold committed Dec 14, 2021 294 

At any time point, we can simply “print” a GRaNIE object by typing its name and a summary of the content is printed to the console.

 295 
 Christian Arnold committed Jan 21, 2022 296 297 298 299 

We are now ready to fill our empty object with data! After preparing the data beforehand, we can now use the data import function addData() to import both peaks and RNA-seq data to the GRaNIE object. In addition to the count tables, we explicitly specify the name of the ID columns. As mentioned before, the sample metadata is optional but recommended if available.

 300 

An important consideration is data normalization for RNA and ATAC. We currently support three choices of normalization: quantile, DESeq_sizeFactor and none and refer to the R help for more details (?addData). The default for RNA-Seq is a quantile normalization, while for the open chromatin peak data, it is DESeq_sizeFactor (i.e., a “regular” DESeq size factor normalization). Importantly, DESeq_sizeFactor requires raw data, while quantile does not necessarily. We nevertheless recommend raw data as input, although it is also possible to provide pre-normalized data as input and then topping this up with another normalization method or “none”.

 Christian Arnold committed Jan 21, 2022 301 302 303 
GRN = GRaNIEdev::addData(GRN, countsPeaks.df, normalization_peaks = "DESeq_sizeFactor",     idColumn_peaks = idColumn_peaks, countsRNA.df, normalization_rna = "quantile", 
Christian Arnold committed May 07, 2021  304                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 idColumn_RNA = idColumn_RNA, sampleMetadata = sampleMetadata.df, forceRerun = TRUE)
 Christian Arnold committed Jan 21, 2022 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 
## INFO [2022-01-20 22:41:50] Normalize counts. Method: DESeq_sizeFactor, ID column: peakID ## INFO [2022-01-20 22:41:56]  Finished successfully. Execution time: 5.4 secs ## INFO [2022-01-20 22:41:56] Normalize counts. Method: quantile, ID column: ENSEMBL ## INFO [2022-01-20 22:41:57]  Finished successfully. Execution time: 0.9 secs ## INFO [2022-01-20 22:41:57] Subset RNA and peaks and keep only shared samples ## INFO [2022-01-20 22:41:57]  Number of samples for RNA before filtering: 29 ## INFO [2022-01-20 22:41:57]  Number of samples for peaks before filtering: 31 ## INFO [2022-01-20 22:41:57]  29 samples (babk_D,bima_D,cicb_D,coyi_D,diku_D,eipl_D,eiwy_D,eofe_D,fafq_D,febc_D,fikt_D,guss_D,hayt_D,hehd_D,heja_D,hiaf_D,iill_D,kuxp_D,nukw_D,oapg_D,oevr_D,pamv_D,pelm_D,podx_D,qolg_D,sojd_D,vass_D,xugn_D,zaui_D) are shared between the peaks and RNA-Seq data ## WARN [2022-01-20 22:41:57] The following samples from the peaks will be ignored for the classification due to missing overlap with RNA-Seq: uaqe_D,qaqx_D ## INFO [2022-01-20 22:41:57]  Number of samples for RNA after filtering: 29 ## INFO [2022-01-20 22:41:57]  Number of samples for peaks data after filtering: 29 ## INFO [2022-01-20 22:41:57]  Finished successfully. Execution time: 0.1 secs ## INFO [2022-01-20 22:41:57] Produce 1 permutations of RNA-counts ## INFO [2022-01-20 22:41:57] Shuffling columns 1 times ## INFO [2022-01-20 22:41:57]  Finished successfully. Execution time: 0 secs ## INFO [2022-01-20 22:41:57] Parsing provided metadata... ## INFO [2022-01-20 22:42:02] Check for overlapping peaks... ## INFO [2022-01-20 22:42:04]  Calculate statistics for each peak (mean and CV) ## INFO [2022-01-20 22:42:04]  Retrieve peak annotation using ChipSeeker. This may take a while ## >> preparing features information...      2022-01-20 22:42:05  ## >> identifying nearest features...        2022-01-20 22:42:08  ## >> calculating distance from peak to TSS...   2022-01-20 22:42:09  ## >> assigning genomic annotation...        2022-01-20 22:42:09  ## >> adding gene annotation...          2022-01-20 22:42:40  ## >> assigning chromosome lengths           2022-01-20 22:42:40  ## >> done...                    2022-01-20 22:42:40  ## INFO [2022-01-20 22:42:40] Calculate GC-content for peaks...  ## INFO [2022-01-20 22:42:43]  Finished successfully. Execution time: 2.7 secs ## INFO [2022-01-20 22:42:43]  Calculate statistics for each gene (mean and CV)
 Christian Arnold committed Dec 14, 2021 334 

We can see from the output the details for the used normalization method, and the number of samples that are kept in the GRaNIE object. Here, all 29 samples from the RNA data are kept because they are also found in the peak data, while only 29 out of 31 samples from the peak data are also found in the RNA data, resulting in 29 shared samples overall. The RNA counts are also permuted, which will be the basis for all analysis and plots in subsequent steps that repeat the analysis for permuted data in addition to the real, non-permuted data.

 335 
 Christian Arnold committed Jan 21, 2022 336 337 338 339 

Quality control 1: PCA plots

It is time for our first QC plots using the function plotPCA_all()! Now that we added peak and RNA data to the object, let’s check with a Principal Component Analysis (PCA) for both peak and RNA-seq data as well as the original input and the normalized data (unless normalization has been set to none, in which case they are identical to the original data) where the variation in the data comes from. If sample metadata has been provided in the addData() function (something we strongly recommend), they are automatically added to the PCA plots by coloring the PCA results according to the provided metadata, so that potential batch effects can be examined and identified. For more details, see the R help (?plotPCA_all).

 340 

Note that while this step is recommended to do, it is fully optional from a workflow point of view.

 Christian Arnold committed Jan 21, 2022 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 
GRN = GRaNIEdev::plotPCA_all(GRN, type = c("rna", "peaks"), topn = 500, forceRerun = TRUE)
## INFO [2022-01-20 22:42:44]  ## Plotting PCA and metadata correlation of raw RNA data for all shared samples to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/PCA_sharedSamples_RNA.raw.pdf... This may take a few minutes ## INFO [2022-01-20 22:42:46] Prepare PCA. Count transformation: vst ## INFO [2022-01-20 22:42:46]  Writing to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/PCA_sharedSamples_RNA.raw.pdf
## INFO [2022-01-20 22:42:47] Performing and summarizing PCs across metadata for top 500 features
## INFO [2022-01-20 22:42:50]  ## Plotting PCA and metadata correlation of normalized RNA data for all shared samples to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/PCA_sharedSamples_RNA.normalized.pdf... This may take a few minutes ## INFO [2022-01-20 22:42:50] Prepare PCA. Count transformation: none ## INFO [2022-01-20 22:42:50]  Writing to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/PCA_sharedSamples_RNA.normalized.pdf
## INFO [2022-01-20 22:42:51] Performing and summarizing PCs across metadata for top 500 features
## INFO [2022-01-20 22:42:54] Plotting PCA and metadata correlation of raw peaks data for all shared samples to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/PCA_sharedSamples_peaks.raw.pdf... This may take a few minutes ## INFO [2022-01-20 22:42:56] Prepare PCA. Count transformation: vst ## INFO [2022-01-20 22:42:56]  Writing to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/PCA_sharedSamples_peaks.raw.pdf
## INFO [2022-01-20 22:42:59] Performing and summarizing PCs across metadata for top 500 features
## INFO [2022-01-20 22:43:02] Plotting PCA and metadata correlation of normalized peaks data for all shared samples to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/PCA_sharedSamples_peaks.normalized.pdf... This may take a few minutes ## INFO [2022-01-20 22:43:02] Prepare PCA. Count transformation: none ## INFO [2022-01-20 22:43:02]  Writing to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/PCA_sharedSamples_peaks.normalized.pdf
## INFO [2022-01-20 22:43:04] Performing and summarizing PCs across metadata for top 500 features
 Christian Arnold committed May 25, 2021 361 

We can see from the output that four PDF files have been produced, each of which plots the PCA results for the most variable 500, 1000, and 5000 features, respectively. For reasons of brevity and organization, we describe their interpretation and meaning in detail in the Introductory vignette and not here, however (click here for guidance and example plots).

 362 
 Christian Arnold committed Jan 21, 2022 363 364 365 366 

Add TFs and TFBS and overlap with peaks

Now it is time to add data for TFs and predicted TF binding sites (TFBS)! Our GRaNIE package requires pre-computed TFBS that need to be in a specific format (see the Package Details Vignette for details). In brief, a 6-column bed file must be present for each TF, with a specific file name that starts with the name of the TF, an arbitrary and optional suffix (here: “_TFBS”) and a particular file ending (supported are bed or bed.gz; here, we specify the latter). All these files must be located in a particular folder that the addTFBS() functions then searches in order to identify those files that match the specified patterns. We provide example TFBS for the 3 genome assemblies we support, see the comment below and the Package Details Vignette for details. After setting this up, we are ready to overlap the TFBS and the peaks by calling the function overlapPeaksAndTFBS().

 367 

For more parameter details, see the R help (?addTFBS and ?overlapPeaksAndTFBS).

 Christian Arnold committed Jan 21, 2022 368 369 
GRN = GRaNIEdev::addTFBS(GRN, motifFolder = folder_TFBS_first50, TFs = "all", filesTFBSPattern = "_TFBS", 
370                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 fileEnding = ".bed.gz", forceRerun = TRUE)
 Christian Arnold committed Jan 21, 2022 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 
## INFO [2022-01-20 22:43:07] Checking database folder for matching files: /media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/TFBS_selected ## INFO [2022-01-20 22:43:07] Found 75 matching TFs: AIRE.0.C, ANDR.0.A, ANDR.1.A, ANDR.2.A, AP2A.0.A, AP2B.0.B, ARI3A.0.D, ARNT2.0.D, ASCL1.0.A, ASCL2.0.D, ATF2.1.B, ATOH1.0.B, BACH1.0.A, BATF3.0.B, BC11A.0.A, BCL6.0.A, BHA15.0.B, BHE41.0.D, BPTF.0.D, BRAC.0.A, BRCA1.0.D, CDX1.0.C, CDX2.0.A, CEBPA.0.A, CENPB.0.D, CLOCK.0.C, COE1.0.A, COT1.0.C, COT1.1.C, COT2.0.A, COT2.1.A, CTCF.0.A, CTCFL.0.A, CUX2.0.D, DLX1.0.D, DLX2.0.D, DLX4.0.D, DLX6.0.D, DMBX1.0.D, DMRT1.0.D, E2F1.0.A, E2F3.0.A, E2F4.0.A, E2F6.0.A, E2F7.0.B, EGR1.0.A, EGR2.0.A, EGR2.1.A, EHF.0.B, ELF1.0.A, ELF3.0.A, ELK3.0.D, ERR1.0.A, ESR1.0.A, ESR1.1.A, ESR2.0.A, ESR2.1.A, ETS1.0.A, ETS2.0.B, ETV2.0.B, ETV4.0.B, ETV5.0.C, EVI1.0.B, FEZF1.0.C, FLI1.1.A, FOXA3.0.B, FOXB1.0.D, FOXC2.0.D, FOXD2.0.D, FOXD3.0.D, FOXF1.0.D, FOXO4.0.C, FOXP1.0.A, FOXP3.0.D, FUBP1.0.D ## INFO [2022-01-20 22:43:07] Use all TF from the database folder /media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/TFBS_selected ## INFO [2022-01-20 22:43:07] Reading file /media/carnold/DATADRIVE1/R/x86_64-pc-linux-gnu-library/4.1/GRaNIEData/extdata/TFBS_selected/translationTable.csv ## INFO [2022-01-20 22:43:07]  Finished successfully. Execution time: 0 secs ## INFO [2022-01-20 22:43:07] Running the pipeline for 75 TF in total.
GRN = GRaNIEdev::overlapPeaksAndTFBS(GRN, nCores = 1, forceRerun = TRUE)
## INFO [2022-01-20 22:43:07] Overlap peaks and TFBS using 1 cores. This may take a few minutes... ## INFO [2022-01-20 22:43:09]  Calculating intersection for TF AIRE.0.C finished. Number of overlapping TFBS after filtering: 295 ## INFO [2022-01-20 22:43:10]  Calculating intersection for TF ANDR.0.A finished. Number of overlapping TFBS after filtering: 1182 ## INFO [2022-01-20 22:43:11]  Calculating intersection for TF ANDR.1.A finished. Number of overlapping TFBS after filtering: 1007 ## INFO [2022-01-20 22:43:12]  Calculating intersection for TF ANDR.2.A finished. Number of overlapping TFBS after filtering: 1385 ## INFO [2022-01-20 22:43:13]  Calculating intersection for TF ARI3A.0.D finished. Number of overlapping TFBS after filtering: 390 ## INFO [2022-01-20 22:43:14]  Calculating intersection for TF ARNT2.0.D finished. Number of overlapping TFBS after filtering: 1906 ## INFO [2022-01-20 22:43:15]  Calculating intersection for TF ASCL1.0.A finished. Number of overlapping TFBS after filtering: 3454 ## INFO [2022-01-20 22:43:16]  Calculating intersection for TF ASCL2.0.D finished. Number of overlapping TFBS after filtering: 2701 ## INFO [2022-01-20 22:43:17]  Calculating intersection for TF ATF2.1.B finished. Number of overlapping TFBS after filtering: 918 ## INFO [2022-01-20 22:43:18]  Calculating intersection for TF ATOH1.0.B finished. Number of overlapping TFBS after filtering: 2044 ## INFO [2022-01-20 22:43:19]  Calculating intersection for TF BACH1.0.A finished. Number of overlapping TFBS after filtering: 2786 ## INFO [2022-01-20 22:43:19]  Calculating intersection for TF BATF3.0.B finished. Number of overlapping TFBS after filtering: 1095 ## INFO [2022-01-20 22:43:21]  Calculating intersection for TF BC11A.0.A finished. Number of overlapping TFBS after filtering: 10545 ## INFO [2022-01-20 22:43:22]  Calculating intersection for TF BCL6.0.A finished. Number of overlapping TFBS after filtering: 1204 ## INFO [2022-01-20 22:43:23]  Calculating intersection for TF BHA15.0.B finished. Number of overlapping TFBS after filtering: 3414 ## INFO [2022-01-20 22:43:24]  Calculating intersection for TF BHE41.0.D finished. Number of overlapping TFBS after filtering: 1638 ## INFO [2022-01-20 22:43:25]  Calculating intersection for TF BPTF.0.D finished. Number of overlapping TFBS after filtering: 1388 ## INFO [2022-01-20 22:43:26]  Calculating intersection for TF BRCA1.0.D finished. Number of overlapping TFBS after filtering: 731 ## INFO [2022-01-20 22:43:27]  Calculating intersection for TF CDX1.0.C finished. Number of overlapping TFBS after filtering: 778 ## INFO [2022-01-20 22:43:28]  Calculating intersection for TF CDX2.0.A finished. Number of overlapping TFBS after filtering: 449 ## INFO [2022-01-20 22:43:28]  Calculating intersection for TF CEBPA.0.A finished. Number of overlapping TFBS after filtering: 1327 ## INFO [2022-01-20 22:43:29]  Calculating intersection for TF CENPB.0.D finished. Number of overlapping TFBS after filtering: 1057 ## INFO [2022-01-20 22:43:30]  Calculating intersection for TF CLOCK.0.C finished. Number of overlapping TFBS after filtering: 1328 ## INFO [2022-01-20 22:43:32]  Calculating intersection for TF CTCF.0.A finished. Number of overlapping TFBS after filtering: 8572 ## INFO [2022-01-20 22:43:33]  Calculating intersection for TF CTCFL.0.A finished. Number of overlapping TFBS after filtering: 8586 ## INFO [2022-01-20 22:43:34]  Calculating intersection for TF CUX2.0.D finished. Number of overlapping TFBS after filtering: 182 ## INFO [2022-01-20 22:43:34]  Calculating intersection for TF DLX1.0.D finished. Number of overlapping TFBS after filtering: 192 ## INFO [2022-01-20 22:43:35]  Calculating intersection for TF DLX2.0.D finished. Number of overlapping TFBS after filtering: 235 ## INFO [2022-01-20 22:43:36]  Calculating intersection for TF DLX4.0.D finished. Number of overlapping TFBS after filtering: 127 ## INFO [2022-01-20 22:43:36]  Calculating intersection for TF DLX6.0.D finished. Number of overlapping TFBS after filtering: 117 ## INFO [2022-01-20 22:43:37]  Calculating intersection for TF DMBX1.0.D finished. Number of overlapping TFBS after filtering: 132 ## INFO [2022-01-20 22:43:37]  Calculating intersection for TF DMRT1.0.D finished. Number of overlapping TFBS after filtering: 460 ## INFO [2022-01-20 22:43:38]  Calculating intersection for TF E2F1.0.A finished. Number of overlapping TFBS after filtering: 3117 ## INFO [2022-01-20 22:43:39]  Calculating intersection for TF E2F3.0.A finished. Number of overlapping TFBS after filtering: 1702 ## INFO [2022-01-20 22:43:39]  Calculating intersection for TF E2F4.0.A finished. Number of overlapping TFBS after filtering: 4214 ## INFO [2022-01-20 22:43:41]  Calculating intersection for TF E2F6.0.A finished. Number of overlapping TFBS after filtering: 5571 ## INFO [2022-01-20 22:43:42]  Calculating intersection for TF E2F7.0.B finished. Number of overlapping TFBS after filtering: 4742 ## INFO [2022-01-20 22:43:42]  Calculating intersection for TF COE1.0.A finished. Number of overlapping TFBS after filtering: 2350 ## INFO [2022-01-20 22:43:44]  Calculating intersection for TF EGR1.0.A finished. Number of overlapping TFBS after filtering: 8727 ## INFO [2022-01-20 22:43:47]  Calculating intersection for TF EGR2.0.A finished. Number of overlapping TFBS after filtering: 12510 ## INFO [2022-01-20 22:43:49]  Calculating intersection for TF EGR2.1.A finished. Number of overlapping TFBS after filtering: 8788 ## INFO [2022-01-20 22:43:50]  Calculating intersection for TF EHF.0.B finished. Number of overlapping TFBS after filtering: 4947 ## INFO [2022-01-20 22:43:51]  Calculating intersection for TF ELF1.0.A finished. Number of overlapping TFBS after filtering: 3497 ## INFO [2022-01-20 22:43:53]  Calculating intersection for TF ELF3.0.A finished. Number of overlapping TFBS after filtering: 5449 ## INFO [2022-01-20 22:43:53]  Calculating intersection for TF ELK3.0.D finished. Number of overlapping TFBS after filtering: 2171 ## INFO [2022-01-20 22:43:54]  Calculating intersection for TF ESR1.0.A finished. Number of overlapping TFBS after filtering: 1448 ## INFO [2022-01-20 22:43:55]  Calculating intersection for TF ESR1.1.A finished. Number of overlapping TFBS after filtering: 1604 ## INFO [2022-01-20 22:43:56]  Calculating intersection for TF ESR2.0.A finished. Number of overlapping TFBS after filtering: 1878 ## INFO [2022-01-20 22:43:57]  Calculating intersection for TF ESR2.1.A finished. Number of overlapping TFBS after filtering: 3875 ## INFO [2022-01-20 22:43:58]  Calculating intersection for TF ERR1.0.A finished. Number of overlapping TFBS after filtering: 1267 ## INFO [2022-01-20 22:43:59]  Calculating intersection for TF ETS1.0.A finished. Number of overlapping TFBS after filtering: 6255 ## INFO [2022-01-20 22:44:01]  Calculating intersection for TF ETS2.0.B finished. Number of overlapping TFBS after filtering: 7322 ## INFO [2022-01-20 22:44:02]  Calculating intersection for TF ETV2.0.B finished. Number of overlapping TFBS after filtering: 6413 ## INFO [2022-01-20 22:44:03]  Calculating intersection for TF ETV4.0.B finished. Number of overlapping TFBS after filtering: 5073 ## INFO [2022-01-20 22:44:05]  Calculating intersection for TF ETV5.0.C finished. Number of overlapping TFBS after filtering: 10335 ## INFO [2022-01-20 22:44:06]  Calculating intersection for TF FEZF1.0.C finished. Number of overlapping TFBS after filtering: 1030 ## INFO [2022-01-20 22:44:07]  Calculating intersection for TF FLI1.1.A finished. Number of overlapping TFBS after filtering: 8982 ## INFO [2022-01-20 22:44:08]  Calculating intersection for TF FOXA3.0.B finished. Number of overlapping TFBS after filtering: 485 ## INFO [2022-01-20 22:44:09]  Calculating intersection for TF FOXB1.0.D finished. Number of overlapping TFBS after filtering: 257 ## INFO [2022-01-20 22:44:09]  Calculating intersection for TF FOXC2.0.D finished. Number of overlapping TFBS after filtering: 676 ## INFO [2022-01-20 22:44:10]  Calculating intersection for TF FOXD2.0.D finished. Number of overlapping TFBS after filtering: 240 ## INFO [2022-01-20 22:44:11]  Calculating intersection for TF FOXD3.0.D finished. Number of overlapping TFBS after filtering: 958 ## INFO [2022-01-20 22:44:12]  Calculating intersection for TF FOXF1.0.D finished. Number of overlapping TFBS after filtering: 441 ## INFO [2022-01-20 22:44:12]  Calculating intersection for TF FOXO4.0.C finished. Number of overlapping TFBS after filtering: 392 ## INFO [2022-01-20 22:44:13]  Calculating intersection for TF FOXP1.0.A finished. Number of overlapping TFBS after filtering: 435 ## INFO [2022-01-20 22:44:14]  Calculating intersection for TF FOXP3.0.D finished. Number of overlapping TFBS after filtering: 358 ## INFO [2022-01-20 22:44:15]  Calculating intersection for TF FUBP1.0.D finished. Number of overlapping TFBS after filtering: 1034 ## INFO [2022-01-20 22:44:16]  Calculating intersection for TF EVI1.0.B finished. Number of overlapping TFBS after filtering: 243 ## INFO [2022-01-20 22:44:17]  Calculating intersection for TF COT1.0.C finished. Number of overlapping TFBS after filtering: 4789 ## INFO [2022-01-20 22:44:18]  Calculating intersection for TF COT1.1.C finished. Number of overlapping TFBS after filtering: 2968 ## INFO [2022-01-20 22:44:19]  Calculating intersection for TF COT2.0.A finished. Number of overlapping TFBS after filtering: 1403 ## INFO [2022-01-20 22:44:20]  Calculating intersection for TF COT2.1.A finished. Number of overlapping TFBS after filtering: 2539 ## INFO [2022-01-20 22:44:21]  Calculating intersection for TF BRAC.0.A finished. Number of overlapping TFBS after filtering: 740 ## INFO [2022-01-20 22:44:22]  Calculating intersection for TF AP2A.0.A finished. Number of overlapping TFBS after filtering: 2987 ## INFO [2022-01-20 22:44:23]  Calculating intersection for TF AP2B.0.B finished. Number of overlapping TFBS after filtering: 4197 ## INFO [2022-01-20 22:44:23]  Finished execution using 1 cores. TOTAL RUNNING TIME: 1.3 mins
 Christian Arnold committed Dec 14, 2021 456 

We see from the output that 75 TFs have been found in the specified input folder, and the number of TFBS that overlap our peaks for each of them. We now successfully added our TFs and TFBS to the GRaNIE object.

 457 
 Christian Arnold committed Jan 21, 2022 458 459 460 461 

Filter data (optional)

Optionally, we can filter both peaks and RNA-Seq data according to various criteria using the function filterData().

 462 463 464 465 466 467 468 469 470 

For the open chromatin peaks, we currently support three filters:

1. Filter by their normalized mean read counts (minNormalizedMean_peaks, default 5)
2. Filter by their size / width (in bp) and discarding peaks that exceed a particular threshold (maxSize_peaks, default: 10000 bp)
3. Filter by chromosome (only keep chromosomes that are provided as input to the function, chrToKeep_peaks)

For RNA-seq, we currently support the analogous filter as for open chromatin for normalized mean counts as explained above (minNormalizedMeanRNA).

The default values are usually suitable for bulk data and should result in the removal of very few peaks / genes; however, for single-cell data, lowering them may more reasonable. The output will print clearly how many peaks and genes have been filtered, so you can rerun the function with different values if needed.

For more parameter details, see the R help (?filterData).

 Christian Arnold committed Jan 21, 2022 471 

472                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             # Chromosomes to keep for peaks. This should be a vector of chromosome names 
Christian Arnold committed Jan 21, 2022  473      474                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    chrToKeep_peaks = c(paste0("chr", 1:22), "chrX", "chrY") GRN = GRaNIEdev::filterData(GRN, minNormalizedMean_peaks = 5, minNormalizedMeanRNA = 1, 
Christian Arnold committed May 07, 2021  475                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 chrToKeep_peaks = chrToKeep_peaks, maxSize_peaks = 10000, forceRerun = TRUE)
 Christian Arnold committed Jan 21, 2022 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 
## INFO [2022-01-20 22:44:23] FILTER PEAKS ## INFO [2022-01-20 22:44:23]  Number of peaks before filtering : 75000 ## INFO [2022-01-20 22:44:23]   Filter peaks by CV: Min = 0 ## INFO [2022-01-20 22:44:23]   Filter peaks by mean: Min = 5 ## INFO [2022-01-20 22:44:23]  Number of peaks after filtering : 64008 ## INFO [2022-01-20 22:44:23]  Finished successfully. Execution time: 0.1 secs ## INFO [2022-01-20 22:44:23] Filter and sort peaks and remain only those on the following chromosomes: chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY ## INFO [2022-01-20 22:44:23] Filter and sort peaks by size and remain only those smaller than : 10000 ## INFO [2022-01-20 22:44:23]  Number of peaks before filtering: 75000 ## INFO [2022-01-20 22:44:24]  Number of peaks after filtering : 75000 ## INFO [2022-01-20 22:44:24]  Finished successfully. Execution time: 0.6 secs ## INFO [2022-01-20 22:44:24] Collectively, filter 10992 out of 75000 peaks. ## INFO [2022-01-20 22:44:24] Number of remaining peaks: 64008 ## INFO [2022-01-20 22:44:24] FILTER RNA-seq ## INFO [2022-01-20 22:44:24]  Number of genes before filtering : 61534 ## INFO [2022-01-20 22:44:24]   Filter genes by CV: Min = 0 ## INFO [2022-01-20 22:44:24]   Filter genes by mean: Min = 1 ## INFO [2022-01-20 22:44:24]  Number of genes after filtering : 18924 ## INFO [2022-01-20 22:44:24]  Finished successfully. Execution time: 0.1 secs ## INFO [2022-01-20 22:44:24]  Number of rows in total: 35033 ## INFO [2022-01-20 22:44:24]  Flagged 16211 rows because the row mean was smaller than 1
 Christian Arnold committed May 25, 2021 497 

We can see from the output that no peaks have been filtered due to their size and almost 11,000 have been filtered due to their small mean read counts, which collectively leaves around 64,000 peaks out of 75,000 originally. For the RNA data, almost half of the data has been filtered (16,211 out of around 35,000 genes).

 498 
 Christian Arnold committed Jan 21, 2022 499 500 501 502 

We now have all necessary data in the object to start constructing our network. As explained in the Package Details Vignette, we currently support two types of links for our GRaNIE approach:

 503 504 505 506 
1. TF - peak
2. peak - gene
 Christian Arnold committed Jan 21, 2022 507 

Let’s start with TF-peak links! For this, we employ the function addConnections_TF_peak(). By default, we use Pearson to calculate the correlations between TF expression and peak accessibility, but Spearman may sometimes be a better alternative, especially if the diagnostic plots show that the background is not looking as expected.

 Christian Arnold committed Dec 14, 2021 508 

In addition to creating TF-peak links based on TF expression, we can also correlate peak accessibility with other measures. We call this the connection type, and expression is the default one in our framework. However, we implemented a flexible way of allowing also additional or other connection types. Briefly, this works as follows: Additional data has to be imported beforehand with a particular name (the name of the connection type). For example, measures that are related to so-called TF activity can be used in addition or as a replacement of TF expression. For each connection type that we want to include, we simply add it to the parameter connectionTypes along with the binary vector removeNegativeCorrelation that specifies whether or not negatively correlated pairs should be removed or not. For expression, the default is to not remove them, while removal may be more reasonable for measures related to TF activity.

 509 

Lastly, we offer a so called GC-correction that uses a GC-matching background to compare it with the foreground instead of using the full background as comparison. We are still investigating the plausibility and effects of this and therefore mark this feature as experimental as of now.

 Christian Arnold committed Dec 14, 2021 510 

This function may run a while, and each time-consuming step has a built-in progress bar so the remaining time can be estimated. Note that the TF-peak links are constructed for both the original, non-permuted data (in the corresponding output plots that are produced, this is labeled as original) and permuted data (permuted). For more parameter options and parameter details, see the R help (?addConnections_TF_peak).

 Christian Arnold committed Jan 21, 2022 511 512 
GRN = GRaNIEdev::addConnections_TF_peak(GRN, plotDiagnosticPlots = TRUE, connectionTypes = c("expression"), 
Christian Arnold committed Dec 14, 2021  513                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 corMethod = "pearson", forceRerun = TRUE)
 Christian Arnold committed Jan 21, 2022 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 
## INFO [2022-01-20 22:44:24]  ## Real data ##  ## INFO [2022-01-20 22:44:24] Calculate TF-peak links for connection type expression ## INFO [2022-01-20 22:44:24]  Correlate expression and peak counts ## INFO [2022-01-20 22:44:24]   Retain 59 rows from TF/gene data out of 18822 (filter non-TF genes and TF genes with 0 counts throughout and keep only unique ENSEMBL IDs). ## INFO [2022-01-20 22:44:24]   Correlate TF/gene data for 59 unique Ensembl IDs (TFs) and peak counts for 64008 peaks. ## INFO [2022-01-20 22:44:24]   Note: For subsequent steps, the same gene may be associated with multiple TF, depending on the translation table. ## INFO [2022-01-20 22:44:25]   Finished successfully. Execution time: 0.6 secs ## INFO [2022-01-20 22:44:25]  Run FDR calculations for 65 TFs for which TFBS predictions and expression data for the corresponding gene are available. ## INFO [2022-01-20 22:44:25]   Skip the following 10 TF due to missing data: ATOH1.0.B,CDX1.0.C,CTCFL.0.A,DLX6.0.D,DMRT1.0.D,EHF.0.B,ESR2.0.A,ESR2.1.A,FOXA3.0.B,FOXB1.0.D ## INFO [2022-01-20 22:44:25]   Compute FDR for each TF. This may take a while... ## INFO [2022-01-20 22:44:34]   Finished successfully. Execution time: 9.6 secs ## INFO [2022-01-20 22:44:34]  Finished successfully. Execution time: 9.7 secs ## INFO [2022-01-20 22:44:34]  ## Permuted data ##  ## INFO [2022-01-20 22:44:34] Shuffling rows per column ## INFO [2022-01-20 22:44:35]  Finished successfully. Execution time: 0.4 secs ## INFO [2022-01-20 22:44:35] Calculate TF-peak links for connection type expression ## INFO [2022-01-20 22:44:35]  Correlate expression and peak counts ## INFO [2022-01-20 22:44:35]   Retain 59 rows from TF/gene data out of 18822 (filter non-TF genes and TF genes with 0 counts throughout and keep only unique ENSEMBL IDs). ## INFO [2022-01-20 22:44:35]   Correlate TF/gene data for 59 unique Ensembl IDs (TFs) and peak counts for 64008 peaks. ## INFO [2022-01-20 22:44:35]   Note: For subsequent steps, the same gene may be associated with multiple TF, depending on the translation table. ## INFO [2022-01-20 22:44:36]   Finished successfully. Execution time: 0.5 secs ## INFO [2022-01-20 22:44:36]  Run FDR calculations for 65 TFs for which TFBS predictions and expression data for the corresponding gene are available. ## INFO [2022-01-20 22:44:36]   Skip the following 10 TF due to missing data: ATOH1.0.B,CDX1.0.C,CTCFL.0.A,DLX6.0.D,DMRT1.0.D,EHF.0.B,ESR2.0.A,ESR2.1.A,FOXA3.0.B,FOXB1.0.D ## INFO [2022-01-20 22:44:36]   Compute FDR for each TF. This may take a while... ## INFO [2022-01-20 22:44:45]   Finished successfully. Execution time: 9.9 secs ## INFO [2022-01-20 22:44:45]  Finished successfully. Execution time: 10.5 secs ## INFO [2022-01-20 22:44:45] Plotting FDR curves for each TF to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/TF_peak.fdrCurves_original.pdf ## INFO [2022-01-20 22:44:45]  Including a total of 65 TF. Preparing plots... ## INFO [2022-01-20 22:44:47]  Finished generating plots, start plotting to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/TF_peak.fdrCurves_original.pdf. This may take a few minutes.
## INFO [2022-01-20 22:45:08] Finished writing plots to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/TF_peak.fdrCurves_original.pdf ## INFO [2022-01-20 22:45:08]  Finished successfully. Execution time: 23.3 secs ## INFO [2022-01-20 22:45:08] Plotting FDR curves for each TF to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/TF_peak.fdrCurves_permuted.pdf ## INFO [2022-01-20 22:45:08]  Including a total of 65 TF. Preparing plots... ## INFO [2022-01-20 22:45:11]  Finished generating plots, start plotting to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/TF_peak.fdrCurves_permuted.pdf. This may take a few minutes.
## INFO [2022-01-20 22:45:31] Finished writing plots to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/TF_peak.fdrCurves_permuted.pdf ## INFO [2022-01-20 22:45:31]  Finished successfully. Execution time: 23.2 secs
 554 555 

From the output, we see that a total of 65 TFs have RNA-seq data available and consequently will be included and correlated with the peak accessibility. The created PDF files are mentioned in the output and these we’ll take a look at now!

 Christian Arnold committed Jan 21, 2022 556 557 558 

Quality control 2: Diagnostic plots for TF-peak connections

 Christian Arnold committed Dec 14, 2021 559 

After adding the TF-peak links to our GRaNIE object, let’s look at some diagnostic plots. The plots folder within the specified output folder when initializing the GRaNIE object should now contain two new files that are named TF_peak.fdrCurves_original.pdf and TF_peak.fdrCurves_permuted.pdf. For reasons of brevity and organization, we describe their interpretation and meaning in detail in the Introductory vignette and not here, however.

 Christian Arnold committed May 07, 2021 560 561 562 563 564 565 566 567   568 
 Christian Arnold committed Jan 21, 2022 569 570 571 572 573 574 575 576 

Run the AR classification and QC (optional)

Transcription factors (TFs) regulate many cellular processes and can therefore serve as readouts of the signaling and regulatory state. Yet for many TFs, the mode of action—repressing or activating transcription of target genes—is unclear. In analogy to our diffTF approach that we recently published to calculate differential TF activity,the classification of TFs into putative transcriptional activators or repressors can also be done from within the GRaNIE framework in an identical fashion. This can be achieved with the function AR_classification_wrapper().

Note that this step is fully optional and can be skipped. The output of the function is not used for subsequent steps.. To keep the memory footprint of the GRaNIE object low, we recommend to set deleteIntermediateData = TRUE.

GRN = GRaNIEdev::AR_classification_wrapper(GRN, significanceThreshold_Wilcoxon = 0.05,     plot_minNoTFBS_heatmap = 100, plotDiagnosticPlots = TRUE, deleteIntermediateData = TRUE, 
Christian Arnold committed May 07, 2021  577                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 forceRerun = TRUE)
 Christian Arnold committed Jan 21, 2022 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 
## INFO [2022-01-20 22:45:31]  Connection type expression ##  ## INFO [2022-01-20 22:45:31]  Real data ##  ## INFO [2022-01-20 22:45:31]  Correlate expression and peak counts ## INFO [2022-01-20 22:45:31]  Retain 59 rows from TF/gene data out of 18822 (filter non-TF genes and TF genes with 0 counts throughout and keep only unique ENSEMBL IDs). ## INFO [2022-01-20 22:45:31]  Correlate TF/gene data for 59 unique Ensembl IDs (TFs) and peak counts for 64008 peaks. ## INFO [2022-01-20 22:45:31]  Note: For subsequent steps, the same gene may be associated with multiple TF, depending on the translation table. ## INFO [2022-01-20 22:45:32]  Finished successfully. Execution time: 0.4 secs ## INFO [2022-01-20 22:45:32] Compute foreground and background as well as their median values per TF ## INFO [2022-01-20 22:45:34]  Finished successfully. Execution time: 2 secs ## INFO [2022-01-20 22:45:34] Calculate classification thresholds for repressors / activators ## INFO [2022-01-20 22:45:34]  Stringency 0.1: -0.0306 / 0.0157 ## INFO [2022-01-20 22:45:34]  Stringency 0.05: -0.0439 / 0.0195 ## INFO [2022-01-20 22:45:35]  Stringency 0.01: -0.0536 / 0.024 ## INFO [2022-01-20 22:45:36]  Stringency 0.001: -0.0563 / 0.028 ## INFO [2022-01-20 22:45:36]  Finished successfully. Execution time: 1.9 secs ## INFO [2022-01-20 22:45:36] Finalize classification ## INFO [2022-01-20 22:45:36]  Perform Wilcoxon test for each TF. This may take a few minutes. ## INFO [2022-01-20 22:45:47]   Stringency 0.1 ## INFO [2022-01-20 22:45:47]    Change the following TFs to 'undetermined' as they were classified as activator/repressor before but the Wilcoxon test was not significant: AIRE.0.C,ASCL2.0.D,CUX2.0.D ## INFO [2022-01-20 22:45:47]   Stringency 0.05 ## INFO [2022-01-20 22:45:47]    Change the following TFs to 'undetermined' as they were classified as activator/repressor before but the Wilcoxon test was not significant: AIRE.0.C ## INFO [2022-01-20 22:45:47]   Stringency 0.01 ## INFO [2022-01-20 22:45:47]   Stringency 0.001 ## INFO [2022-01-20 22:45:47]  Summary of classification: ## INFO [2022-01-20 22:45:47]   Column classification_q0.1_final ## INFO [2022-01-20 22:45:47]    activator: 25,    undetermined: 17,    repressor: 23,    not-expressed: 10 ## INFO [2022-01-20 22:45:47]   Column classification_q0.05_final ## INFO [2022-01-20 22:45:47]    activator: 23,    undetermined: 22,    repressor: 20,    not-expressed: 10 ## INFO [2022-01-20 22:45:47]   Column classification_q0.01_final ## INFO [2022-01-20 22:45:47]    activator: 21,    undetermined: 27,    repressor: 17,    not-expressed: 10 ## INFO [2022-01-20 22:45:47]   Column classification_q0.001_final ## INFO [2022-01-20 22:45:47]    activator: 19,    undetermined: 30,    repressor: 16,    not-expressed: 10 ## INFO [2022-01-20 22:45:47]  Finished successfully. Execution time: 11.2 secs ## INFO [2022-01-20 22:45:47] Plotting density plots with foreground and background for each TF to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/TF_classification_densityPlotsForegroundBackground_expression_original.pdf
## INFO [2022-01-20 22:45:48]  Finished successfully. Execution time: 1.2 secs ## INFO [2022-01-20 22:45:48] Plotting AR summary plot to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/TF_classification_stringencyThresholds_expression_original.pdf
## INFO [2022-01-20 22:45:48]  Finished successfully. Execution time: 0.1 secs ## INFO [2022-01-20 22:45:48] Plotting AR heatmap to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/TF_classification_summaryHeatmap_expression_original.pdf
## INFO [2022-01-20 22:45:50]  Finished successfully. Execution time: 1.1 secs ## INFO [2022-01-20 22:45:50]  Permuted data ##  ## INFO [2022-01-20 22:45:50]  Correlate expression and peak counts ## INFO [2022-01-20 22:45:50]  Retain 59 rows from TF/gene data out of 18822 (filter non-TF genes and TF genes with 0 counts throughout and keep only unique ENSEMBL IDs). ## INFO [2022-01-20 22:45:50]  Correlate TF/gene data for 59 unique Ensembl IDs (TFs) and peak counts for 64008 peaks. ## INFO [2022-01-20 22:45:50]  Note: For subsequent steps, the same gene may be associated with multiple TF, depending on the translation table. ## INFO [2022-01-20 22:45:50]  Finished successfully. Execution time: 0.4 secs ## INFO [2022-01-20 22:45:50] Shuffling rows per column ## INFO [2022-01-20 22:45:51]  Finished successfully. Execution time: 0.6 secs ## INFO [2022-01-20 22:45:51] Compute foreground and background as well as their median values per TF ## INFO [2022-01-20 22:45:53]  Finished successfully. Execution time: 2 secs ## INFO [2022-01-20 22:45:53] Calculate classification thresholds for repressors / activators ## INFO [2022-01-20 22:45:54]  Stringency 0.1: -0.0215 / 0.0145 ## INFO [2022-01-20 22:45:54]  Stringency 0.05: -0.0281 / 0.0173 ## INFO [2022-01-20 22:45:55]  Stringency 0.01: -0.0307 / 0.0196 ## INFO [2022-01-20 22:45:56]  Stringency 0.001: -0.0309 / 0.0197 ## INFO [2022-01-20 22:45:56]  Finished successfully. Execution time: 2.9 secs ## INFO [2022-01-20 22:45:56] Finalize classification ## INFO [2022-01-20 22:45:56]  Perform Wilcoxon test for each TF. This may take a few minutes. ## INFO [2022-01-20 22:46:08]   Stringency 0.1 ## INFO [2022-01-20 22:46:08]    Change the following TFs to 'undetermined' as they were classified as activator/repressor before but the Wilcoxon test was not significant: ANDR.0.A,BPTF.0.D,CENPB.0.D,DLX4.0.D,FOXD3.0.D ## INFO [2022-01-20 22:46:08]   Stringency 0.05 ## INFO [2022-01-20 22:46:08]    Change the following TFs to 'undetermined' as they were classified as activator/repressor before but the Wilcoxon test was not significant: ANDR.0.A,BPTF.0.D,CENPB.0.D,DLX4.0.D ## INFO [2022-01-20 22:46:08]   Stringency 0.01 ## INFO [2022-01-20 22:46:08]    Change the following TFs to 'undetermined' as they were classified as activator/repressor before but the Wilcoxon test was not significant: CENPB.0.D,DLX4.0.D ## INFO [2022-01-20 22:46:08]   Stringency 0.001 ## INFO [2022-01-20 22:46:08]    Change the following TFs to 'undetermined' as they were classified as activator/repressor before but the Wilcoxon test was not significant: CENPB.0.D,DLX4.0.D ## INFO [2022-01-20 22:46:08]  Summary of classification: ## INFO [2022-01-20 22:46:08]   Column classification_q0.1_final ## INFO [2022-01-20 22:46:08]    activator: 1,    undetermined: 64,    repressor: 0,    not-expressed: 10 ## INFO [2022-01-20 22:46:08]   Column classification_q0.05_final ## INFO [2022-01-20 22:46:08]    activator: 1,    undetermined: 64,    repressor: 0,    not-expressed: 10 ## INFO [2022-01-20 22:46:08]   Column classification_q0.01_final ## INFO [2022-01-20 22:46:08]    activator: 0,    undetermined: 65,    repressor: 0,    not-expressed: 10 ## INFO [2022-01-20 22:46:08]   Column classification_q0.001_final ## INFO [2022-01-20 22:46:08]    activator: 0,    undetermined: 65,    repressor: 0,    not-expressed: 10 ## INFO [2022-01-20 22:46:08]  Finished successfully. Execution time: 12.3 secs ## INFO [2022-01-20 22:46:08] Plotting density plots with foreground and background for each TF to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/TF_classification_densityPlotsForegroundBackground_expression_permuted.pdf
## INFO [2022-01-20 22:46:09]  Finished successfully. Execution time: 1.2 secs ## INFO [2022-01-20 22:46:09] Plotting AR summary plot to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/TF_classification_stringencyThresholds_expression_permuted.pdf
## INFO [2022-01-20 22:46:09]  Finished successfully. Execution time: 0.1 secs ## INFO [2022-01-20 22:46:09] Shuffling rows per column ## INFO [2022-01-20 22:46:10]  Finished successfully. Execution time: 0.4 secs ## INFO [2022-01-20 22:46:10] Plotting AR heatmap to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/TF_classification_summaryHeatmap_expression_permuted.pdf
## INFO [2022-01-20 22:46:11]  Finished successfully. Execution time: 0.9 secs
 664 665 666 

From the output, we see that the classification has been run for both real and permuted data, as before. For permuted data, almost all TFs are classified as undetermined, while for the non-permuted one, the majority of TFs is either an activator or repressor. This is irrespective of the classification stringency. Overall, this is not surprising and in fact re-assuring and indicates we capture real signal.

The contents of these plots is identical to and uses in fact practically the same code as our diffTF software. We refer to the following links for more details:

 Christian Arnold committed Jan 21, 2022 667 668 
1. The official diffTF paper
2. In general, the ReadTheDocs documentaion, and in particular this chapter. In File {comparisonType}.diagnosticPlotsClassification1.pdf:, pages 1-4, the content of the files “TF_classification_stringencyThresholds* are explained in detail, while in File {comparisonType}.diagnosticPlotsClassification2.pdf:, Page 20 - end the contents of the files TF_classification_summaryHeatmap and TF_classification_densityPlotsForegroundBackground are elaborated upon.
3.  669 670 671 

For more parameter details, see the R help (?AR_classification_wrapper).

 Christian Arnold committed Jan 21, 2022 672 673 674 

Save GRaNIE object to disk (optional)

 Christian Arnold committed Dec 14, 2021 675 

After steps that take up a bit of time, it may make sense to store the GRaNIE object to disk in order to be able to restore it at any time point. This can simply be done, for example, by saving it as an rds file using the built-in function saveRDS from R to save our GRaNIE object in a compressed rds format.

 Christian Arnold committed Jan 21, 2022 676 677 678 679 680 
GRN_file_outputRDS = paste0(dir_output, "/GRN.rds") saveRDS(GRN, GRN_file_outputRDS)

You can then, at any time point, read it back into R with the following line:

 681 
 Christian Arnold committed Jan 21, 2022 682 683 684 685 686 

Let’s add now the second type of connections, peak-genes! This can be done via the function addConnections_peak_gene().

TODO more Note that to make the function run faster, we restrict the maximum peak-gene distance to 10,000 bp here, while the default is 250,000 bp.

 Christian Arnold committed May 07, 2021 687 688 689 690 691 692 693   694 

For more parameter details, see the R help (?addConnections_peak_gene).

 Christian Arnold committed Jan 21, 2022 695 696 697 
GRN = GRaNIEdev::addConnections_peak_gene(GRN, overlapTypeGene = "TSS", corMethod = "pearson",     promoterRange = 10000, TADs = NULL, nCores = 1, plotDiagnosticPlots = TRUE, plotGeneTypes = list(c("all")), 
Christian Arnold committed Dec 14, 2021  698                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 forceRerun = TRUE)
 Christian Arnold committed Jan 21, 2022 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 
## INFO [2022-01-20 22:46:11]  ## Real data ##  ## INFO [2022-01-20 22:46:11] Calculate peak-gene correlations for neighborhood size 10000 ## INFO [2022-01-20 22:46:11] Calculate peak gene overlaps... ## INFO [2022-01-20 22:46:11] Extend peaks based on user-defined extension size of 10000 up- and downstream. ## INFO [2022-01-20 22:46:11] Reading pre-compiled genome annotation data  ## INFO [2022-01-20 22:46:11]  Finished successfully. Execution time: 0.4 secs ## INFO [2022-01-20 22:46:11]  Iterate through 41912 peak-gene combinations and (if possible) calculate correlations using 1 cores. This may take a few minutes. ## INFO [2022-01-20 22:46:19]  Finished execution using 1 cores. TOTAL RUNNING TIME: 7.9 secs ##  ## INFO [2022-01-20 22:46:19]  Finished with calculating correlations, creating final data frame and filter NA rows due to missing RNA-seq data ## INFO [2022-01-20 22:46:19]  Initial number of rows: 41912 ## INFO [2022-01-20 22:46:20]  Finished. Final number of rows: 18804 ## INFO [2022-01-20 22:46:20]  Finished successfully. Execution time: 8.7 secs ## INFO [2022-01-20 22:46:20]  ## Permuted data ##  ## INFO [2022-01-20 22:46:20] Calculate random peak-gene correlations for neighborhood size 10000 ## INFO [2022-01-20 22:46:20] Calculate peak gene overlaps... ## INFO [2022-01-20 22:46:20] Extend peaks based on user-defined extension size of 10000 up- and downstream. ## INFO [2022-01-20 22:46:20] Reading pre-compiled genome annotation data  ## INFO [2022-01-20 22:46:20]  Finished successfully. Execution time: 0.4 secs ## INFO [2022-01-20 22:46:20]  Randomize gene-peak links by shuffling the peak IDs. ## INFO [2022-01-20 22:46:20]  Iterate through 41912 peak-gene combinations and (if possible) calculate correlations using 1 cores. This may take a few minutes. ## INFO [2022-01-20 22:46:29]  Finished execution using 1 cores. TOTAL RUNNING TIME: 8.8 secs ##  ## INFO [2022-01-20 22:46:29]  Finished with calculating correlations, creating final data frame and filter NA rows due to missing RNA-seq data ## INFO [2022-01-20 22:46:29]  Initial number of rows: 41912 ## INFO [2022-01-20 22:46:29]  Finished. Final number of rows: 18804 ## INFO [2022-01-20 22:46:29]  Finished successfully. Execution time: 9.6 secs ## INFO [2022-01-20 22:46:29] Plotting diagnostic plots for peak-gene correlations to file(s) with basename /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/peakGene_diagnosticPlots_ ## INFO [2022-01-20 22:46:29]  Gene type all
## INFO [2022-01-20 22:46:44]  Finished successfully. Execution time: 14.8 secs
 733 734 

We see from the output that almost 42,000 peak-gene links have been identified that match our parameters (here: a maximum peak-gene distance of 10 kb). From these 42.000, however, only around 18,804 actually had corresponding RNA-seq data available, while RNA-seq data was missing or has been filtered for the other. This is a rather typical case, as not all known and annotated genes are included in the RNA-seq data in the first place. Similar to before, the correlations have also been performed for the permuted data.

 Christian Arnold committed Jan 21, 2022 735 736 737 

Quality control 3: Diagnostic plots for peak-gene connections

 738 739 

Let’s now check some diagnostic plots for the peak-gene connections. In analogy to the other diagnostic plots that we encountered already before, we describe their interpretation and meaning in detail in the Introductory vignette.

 Christian Arnold committed Jan 21, 2022 740 741 742 743 744 745 746 

Combine TF-peak and peak-gene connections and filter

Now that we added both TF-peaks and peak-gene links to our GRaNIE object, we are ready to filter and combine them. So far, they are stored separately in the object for various reasons (see the Introductory Vignette for details), but ultimately, we aim for combining them to derive TF-peak-gene connections. To do so, we can simply run the filterGRNAndConnectGenes() function and filter the individual TF-peak and peak-gene links to our liking. The function has many more arguments, and we only specify a few in the example below. As before, we get a GRaNIE object back that now contains the merged and filtered TF-peak-gene connections that we can later extract. Some of the filters apply to the TF-peak links, some of them to the peak-gene links, the parameter name is intended to indicate that.

GRN = GRaNIEdev::filterGRNAndConnectGenes(GRN, TF_peak.fdr.threshold = 0.2, peak_gene.fdr.threshold = 0.2,     peak_gene.fdr.method = "BH", gene.types = c("protein_coding", "lincRNA"), allowMissingTFs = FALSE, 
Christian Arnold committed Dec 14, 2021  747                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 allowMissingGenes = FALSE)
 Christian Arnold committed Jan 21, 2022 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 
## INFO [2022-01-20 22:46:44] Filter GRN network ## INFO [2022-01-20 22:46:44]  ##  ## Real data ## INFO [2022-01-20 22:46:44] Inital number of rows left before all filtering steps: 23096 ## INFO [2022-01-20 22:46:44]  Filter network and retain only rows with TF-peak connections with an FDR < 0.2 ## INFO [2022-01-20 22:46:44]   Number of TF-peak rows before filtering TFs: 23096 ## INFO [2022-01-20 22:46:44]   Number of TF-peak rows after filtering TFs: 4907 ## INFO [2022-01-20 22:46:44] 2. Filter peak-gene connections ## INFO [2022-01-20 22:46:44]  Filter genes by gene type, keep only the following gene types: protein_coding, lincRNA ## INFO [2022-01-20 22:46:44]   Number of peak-gene rows before filtering by gene type: 18828 ## INFO [2022-01-20 22:46:44]   Number of peak-gene rows after filtering by gene type: 18734 ## INFO [2022-01-20 22:46:44] 3. Merging TF-peak with peak-gene connections and filter the combined table... ## INFO [2022-01-20 22:46:44] Inital number of rows left before all filtering steps: 5955 ## INFO [2022-01-20 22:46:44]  Filter rows with missing ENSEMBL IDs ## INFO [2022-01-20 22:46:44]   Number of rows before filtering: 5955 ## INFO [2022-01-20 22:46:44]   Number of rows after filtering: 4002 ## INFO [2022-01-20 22:46:45]  Filter network and retain only rows with peak_gene.r in the following interval: (0 - 1] ## INFO [2022-01-20 22:46:45]   Number of rows before filtering TFs: 4002 ## INFO [2022-01-20 22:46:45]   Number of rows after filtering TFs: 2364 ## INFO [2022-01-20 22:46:45]  Calculate FDR based on remaining rows, filter network and retain only rows with peak-gene connections with an FDR < 0.2 ## INFO [2022-01-20 22:46:45]   Number of rows before filtering genes (including NA): 2364 ## INFO [2022-01-20 22:46:45]   Number of rows before filtering genes (excluding NA): 2364 ## INFO [2022-01-20 22:46:45]   Number of rows after filtering genes (including NA): 626 ## INFO [2022-01-20 22:46:45]   Number of rows after filtering genes (excluding NA): 626 ## INFO [2022-01-20 22:46:45] Final number of rows left after all filtering steps: 626 ## INFO [2022-01-20 22:46:45]  Finished successfully. Execution time: 0.6 secs ## INFO [2022-01-20 22:46:45]  ##  ## Permuted data ## INFO [2022-01-20 22:46:45] Inital number of rows left before all filtering steps: 60 ## INFO [2022-01-20 22:46:45]  Filter network and retain only rows with TF-peak connections with an FDR < 0.2 ## INFO [2022-01-20 22:46:45]   Number of TF-peak rows before filtering TFs: 60 ## INFO [2022-01-20 22:46:45]   Number of TF-peak rows after filtering TFs: 25 ## INFO [2022-01-20 22:46:45] 2. Filter peak-gene connections ## INFO [2022-01-20 22:46:45]  Filter genes by gene type, keep only the following gene types: protein_coding, lincRNA ## INFO [2022-01-20 22:46:45]   Number of peak-gene rows before filtering by gene type: 18828 ## INFO [2022-01-20 22:46:45]   Number of peak-gene rows after filtering by gene type: 18734 ## INFO [2022-01-20 22:46:45] 3. Merging TF-peak with peak-gene connections and filter the combined table... ## INFO [2022-01-20 22:46:45] Inital number of rows left before all filtering steps: 26 ## INFO [2022-01-20 22:46:45]  Filter rows with missing ENSEMBL IDs ## INFO [2022-01-20 22:46:45]   Number of rows before filtering: 26 ## INFO [2022-01-20 22:46:45]   Number of rows after filtering: 6 ## INFO [2022-01-20 22:46:45]  Filter network and retain only rows with peak_gene.r in the following interval: (0 - 1] ## INFO [2022-01-20 22:46:45]   Number of rows before filtering TFs: 6 ## INFO [2022-01-20 22:46:45]   Number of rows after filtering TFs: 2 ## INFO [2022-01-20 22:46:45]  Calculate FDR based on remaining rows, filter network and retain only rows with peak-gene connections with an FDR < 0.2 ## INFO [2022-01-20 22:46:45]   Number of rows before filtering genes (including NA): 2 ## INFO [2022-01-20 22:46:45]   Number of rows before filtering genes (excluding NA): 2 ## INFO [2022-01-20 22:46:45]   Number of rows after filtering genes (including NA): 0 ## INFO [2022-01-20 22:46:45]   Number of rows after filtering genes (excluding NA): 0 ## INFO [2022-01-20 22:46:45] Final number of rows left after all filtering steps: 0 ## INFO [2022-01-20 22:46:45]  Finished successfully. Execution time: 1.1 secs
 Christian Arnold committed Dec 14, 2021 801 

The output shows the number of links before and after applying a particular filter that has been set for both real and permuted data. As expected and reassuringly, almost no connections remain for the permuted data, while the real data keeps around 2500 connections.

 802 803 

For more parameter details, see the R help (?filterGRNAndConnectGenes).

 Christian Arnold committed Jan 21, 2022 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 

Optionally, we can also include extra columns about the correlation of TF and genes directly. So far, only TF-peaks and peak-genes have been correlated, but not directly TFs and genes. Based on a filtered set of TF-peak-gene connections, the function add_TF_gene_correlation() calculates the TF-gene correlation for each connection from the filtered set for which the TF is not missing.

GRN = GRaNIEdev::add_TF_gene_correlation(GRN, corMethod = "pearson", nCores = 1,     forceRerun = TRUE)
## INFO [2022-01-20 22:46:45] Calculate correlations for TF and genes from the filtered set of connections ## INFO [2022-01-20 22:46:45]  Real data ## INFO [2022-01-20 22:46:45]   Iterate through 582 TF-gene combinations and (if possible) calculate correlations using 1 cores. This may take a few minutes. ## INFO [2022-01-20 22:46:47]  Finished execution using 1 cores. TOTAL RUNNING TIME: 1.3 secs ##  ## INFO [2022-01-20 22:46:47]   Done. Construct the final table, this may result in an increased number of TF-gene pairs due to different TF names linked to the same Ensembl ID. ## INFO [2022-01-20 22:46:47]  Permuted data ## WARN [2022-01-20 22:46:47]  Nothing to do, skip.
 819 820 821 

As can be seen from the output, the Pearson correlation for 587 TF-gene pairs has been calculated. From the around 2500 connections we obtained above, since we set the parameter allowMissingGenes = TRUE, for the majority of the TF-peak-gene connections the gene is actually missing. That is, while a TF-peak connection below the specified significance threshold exists, no corresponding gene could be found that connects to the same peak, therefore setting the gene to NA rather than excluding the row altogether.

For more parameter details, see the R help (?add_TF_gene_correlation).

 Christian Arnold committed Jan 21, 2022 822 823 824 825 826 827 

Retrieve filtered connections

We are now ready to retrieve the connections and the additional data we added to them. This can be done with the helper function getGRNConnections() that retrieves a data frame from a GRaNIE object from a particular slot. Here, we specify all.filtered, as we want to retrieve all filtered connections. For more parameter details, see the R help (getGRNConnections). Note that the first time, we assign a different variable to the return of the function (i.e., GRN_connections.all and NOT GRaNIE as before). Importantly, we have to select a new variable as we would otherwise overwrite our GRaNIE object altogether! All get functions from the GRaNIE package return an element from within the object and NOT the object itself, so please keep that in mind and always check what the functions returns before running it. You can simply do so in the R help (?getGRNConnections).

GRN_connections.all = GRaNIEdev::getGRNConnections(GRN, type = "all.filtered", include_TF_gene_correlations = TRUE) 
828      829                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     GRN_connections.all
 Christian Arnold committed Jan 21, 2022 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 
## # A tibble: 626 × 32 ##    TF.name   TF.ENSEMBL     TF_peak.r_bin TF_peak.r TF_peak.fdr TF_peak.fdr_orig ##    <chr>     <fct>          <fct>             <dbl>       <dbl>            <dbl> ##  1 BATF3.0.B ENSG000001236… [0.65,0.7)        0.684       0.185            0.185 ##  2 E2F6.0.A  ENSG000001690… [0.55,0.6)        0.550       0.156            0.156 ##  3 E2F6.0.A  ENSG000001690… [0.5,0.55)        0.514       0.175            0.175 ##  4 E2F6.0.A  ENSG000001690… [0.5,0.55)        0.539       0.175            0.175 ##  5 E2F6.0.A  ENSG000001690… [0.5,0.55)        0.539       0.175            0.175 ##  6 E2F6.0.A  ENSG000001690… [0.45,0.5)        0.494       0.191            0.191 ##  7 E2F6.0.A  ENSG000001690… [0.55,0.6)        0.585       0.156            0.156 ##  8 E2F6.0.A  ENSG000001690… [0.5,0.55)        0.501       0.175            0.175 ##  9 E2F6.0.A  ENSG000001690… [0.65,0.7)        0.663       0.166            0.166 ## 10 E2F6.0.A  ENSG000001690… [0.5,0.55)        0.528       0.175            0.175 ## # … with 616 more rows, and 26 more variables: TF_peak.fdr_direction <fct>, ## #   TF_peak.connectionType <fct>, peak.ID <fct>, peak.mean <dbl>, ## #   peak.median <dbl>, peak.CV <dbl>, peak.annotation <fct>, ## #   peak.GC.perc <dbl>, peak.width <int>, peak.GC.class <ord>, ## #   peak_gene.distance <int>, peak_gene.r <dbl>, peak_gene.p_raw <dbl>, ## #   peak_gene.p_adj <dbl>, gene.ENSEMBL <fct>, gene.mean <dbl>, ## #   gene.median <dbl>, gene.CV <dbl>, gene.chr <fct>, gene.start <int>, …

The table contains a total of 28 columns, and the prefix of each column name indicates the part of the eGRN network that the column refers to (e.g., TFs, TF-peaks, peaks, peak-genes or genes, or TF-gene if the function add_TF_gene_correlation() has been run before). Data are stored in a format that minimizes the memory footprint (e.g., each character column is stored as a factor). This table can now be used for any downstream analysis, as it is just a normal data frame.

 851 
 Christian Arnold committed Jan 21, 2022 852 853 854 

Visualize the filtered eGRN connections

 Christian Arnold committed Dec 14, 2021 855 

The GRaNIE package will soon also offer some rudimentary functions to visualize a filtered eGRN network. Stay tuned! Meanwhile, you can use the igraph package to construct a graph out of the filtered TF-peak-gene connection table (see above).

 856 
 Christian Arnold committed Jan 21, 2022 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 

Generate a connection summary

It is often useful to get a grasp of the general connectivity of a network and the number of connections that survive the filtering. This makes it possible to make an informed decision about which FDR to choose for TF-peak and peak-gene links, depending on how many links are retained and how many connections are needed for downstream analysis. To facilitate this and automate it, we offer the convenience function generateStatsSummary() that in essence iterates over different combinations of filtering parameters and calls the function filterGRNAndConnectGenes() once for each of them, and then records various connectivity statistics, and finally plots it by calling the function plot_stats_connectionSummary(). Note that running this function may take a while. Afterwards, we can graphically summarize this result in either a heatmap or a boxplot. For more parameter details, see the R help (?generateStatsSummary and plot_stats_connectionSummary).

GRN = GRaNIEdev::generateStatsSummary(GRN, TF_peak.fdr = c(0.01, 0.05, 0.1, 0.2),     TF_peak.connectionTypes = "all", peak_gene.p_raw = NULL, peak_gene.fdr = c(0.01,         0.05, 0.1, 0.2), peak_gene.r_range = c(0, 1), allowMissingGenes = c(FALSE,         TRUE), allowMissingTFs = c(FALSE), gene.types = c("protein_coding", "lincRNA"))
## INFO [2022-01-20 22:46:47] Generating summary. This may take a while... ## INFO [2022-01-20 22:46:47]  ## Real data... ##  ## INFO [2022-01-20 22:46:47] Calculate network stats for TF-peak FDR of 0.01 ## INFO [2022-01-20 22:46:54] Calculate network stats for TF-peak FDR of 0.05 ## INFO [2022-01-20 22:47:00] Calculate network stats for TF-peak FDR of 0.1 ## INFO [2022-01-20 22:47:06] Calculate network stats for TF-peak FDR of 0.2 ## INFO [2022-01-20 22:47:13]  ## Permuted data... ##  ## INFO [2022-01-20 22:47:13] Calculate network stats for TF-peak FDR of 0.01 ## INFO [2022-01-20 22:47:19] Calculate network stats for TF-peak FDR of 0.05 ## INFO [2022-01-20 22:47:26] Calculate network stats for TF-peak FDR of 0.1 ## INFO [2022-01-20 22:47:32] Calculate network stats for TF-peak FDR of 0.2
GRN = GRaNIEdev::plot_stats_connectionSummary(GRN, type = "heatmap", forceRerun = TRUE)
## INFO [2022-01-20 22:47:39] Plotting connection summary to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/GRN.connectionSummary_heatmap.pdf
## INFO [2022-01-20 22:47:39] Finished writing plots to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/GRN.connectionSummary_heatmap.pdf ## INFO [2022-01-20 22:47:39]  Finished successfully. Execution time: 0.4 secs
GRN = GRaNIEdev::plot_stats_connectionSummary(GRN, type = "boxplot", forceRerun = TRUE)
## INFO [2022-01-20 22:47:39] Plotting diagnostic plots for network connections to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/GRN.connectionSummary_boxplot.pdf
## INFO [2022-01-20 22:47:47]  Finished successfully. Execution time: 8.2 secs
 890 891 

The output is not very informative here and just tells us about the current progress and parameter it iterates over. We can now check the two new PDF files that have been created! Please see the Introductory Vignette for examples and interpretation.

 Christian Arnold committed Jan 21, 2022 892 893 894 895 

Enrichment analyses

Lastly, our framework also supports various types of enrichment analyses that are fully integrated into the package. We offer these for the full network as well as per community. The latter can be calculated v For both the general and the community statistics and enrichment, the package can:

 Christian Arnold committed Dec 14, 2021 896 
 Christian Arnold committed Jan 21, 2022 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 

All functions can be called individually, adjusted flexibly and the data is stored in the GRaNIE object for ultimate flexibility. In the near future, we plan to expand this set of functionality to additional enrichment analyses such as other databases (specific diseases pathways etc), so stay tuned! calculateCommunitiesStats() For user convenience, all aforementioned functions can be called at once via a designated wrapper function performAllNetworkAnalyses().

GRN = GRaNIEdev::performAllNetworkAnalyses(GRN, forceRerun = TRUE)
## INFO [2022-01-20 22:47:47] Plotting general network statistics to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/GRN.overall_stats.pdf
## INFO [2022-01-20 22:47:50]  Finished successfully. Execution time: 2.7 secs ## INFO [2022-01-20 22:47:50] Calculating general enrichment statistics... This may take a while. ## INFO [2022-01-20 22:48:41]  Enrichment calculation finished for ontology BP. Checked 7116 terms ## INFO [2022-01-20 22:48:41]   Number of terms for which p-value <= 0.01: 18 ## INFO [2022-01-20 22:48:41]   Number of terms for which p-value <= 0.05: 79 ## INFO [2022-01-20 22:48:41]   Number of terms for which p-value <= 0.1: 159 ## INFO [2022-01-20 22:48:41]   Number of terms for which p-value <= 0.2: 406 ## INFO [2022-01-20 22:48:47]  Enrichment calculation finished for ontology MF. Checked 1354 terms ## INFO [2022-01-20 22:48:47]   Number of terms for which p-value <= 0.01: 6 ## INFO [2022-01-20 22:48:47]   Number of terms for which p-value <= 0.05: 27 ## INFO [2022-01-20 22:48:47]   Number of terms for which p-value <= 0.1: 35 ## INFO [2022-01-20 22:48:47]   Number of terms for which p-value <= 0.2: 90 ## INFO [2022-01-20 22:48:47] Results stored in GRN@stats[["Enrichment"]][["general"]] ## INFO [2022-01-20 22:48:48] Finished successfully. Execution time: 57.6 secs ## INFO [2022-01-20 22:48:48] Plotting general enrichment results to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/GRN.overall_enrichment.pdf
## INFO [2022-01-20 22:48:48]  Finished successfully. Execution time: 0.4 secs ## INFO [2022-01-20 22:48:48] Calculating communities for clustering type louvain... ## INFO [2022-01-20 22:48:48]  Finished successfully. Execution time: 0.1 secs ## INFO [2022-01-20 22:48:48] Plotting community statistics to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/GRN.community_stats.pdf
## INFO [2022-01-20 22:48:54]  Finished successfully. Execution time: 5.9 secs ## INFO [2022-01-20 22:48:54] Running enrichment analysis for all communities. This may take a while... ## INFO [2022-01-20 22:48:54]  Community 4 ## INFO [2022-01-20 22:49:42]  Enrichment calculation finished for ontology BP. Checked 7116 terms ## INFO [2022-01-20 22:49:42]   Number of terms for which p-value <= 0.01: 17 ## INFO [2022-01-20 22:49:42]   Number of terms for which p-value <= 0.05: 41 ## INFO [2022-01-20 22:49:42]   Number of terms for which p-value <= 0.1: 151 ## INFO [2022-01-20 22:49:42]   Number of terms for which p-value <= 0.2: 357 ## INFO [2022-01-20 22:49:47]  Enrichment calculation finished for ontology MF. Checked 1354 terms ## INFO [2022-01-20 22:49:47]   Number of terms for which p-value <= 0.01: 1 ## INFO [2022-01-20 22:49:47]   Number of terms for which p-value <= 0.05: 8 ## INFO [2022-01-20 22:49:47]   Number of terms for which p-value <= 0.1: 45 ## INFO [2022-01-20 22:49:47]   Number of terms for which p-value <= 0.2: 90 ## INFO [2022-01-20 22:49:47]  Community 2 ## INFO [2022-01-20 22:50:30]  Enrichment calculation finished for ontology BP. Checked 7116 terms ## INFO [2022-01-20 22:50:30]   Number of terms for which p-value <= 0.01: 2 ## INFO [2022-01-20 22:50:30]   Number of terms for which p-value <= 0.05: 30 ## INFO [2022-01-20 22:50:30]   Number of terms for which p-value <= 0.1: 124 ## INFO [2022-01-20 22:50:30]   Number of terms for which p-value <= 0.2: 249 ## INFO [2022-01-20 22:50:34]  Enrichment calculation finished for ontology MF. Checked 1354 terms ## INFO [2022-01-20 22:50:34]   Number of terms for which p-value <= 0.01: 0 ## INFO [2022-01-20 22:50:34]   Number of terms for which p-value <= 0.05: 2 ## INFO [2022-01-20 22:50:34]   Number of terms for which p-value <= 0.1: 35 ## INFO [2022-01-20 22:50:34]   Number of terms for which p-value <= 0.2: 71 ## INFO [2022-01-20 22:50:34]  Community 3 ## INFO [2022-01-20 22:51:19]  Enrichment calculation finished for ontology BP. Checked 7116 terms ## INFO [2022-01-20 22:51:19]   Number of terms for which p-value <= 0.01: 6 ## INFO [2022-01-20 22:51:19]   Number of terms for which p-value <= 0.05: 79 ## INFO [2022-01-20 22:51:19]   Number of terms for which p-value <= 0.1: 175 ## INFO [2022-01-20 22:51:19]   Number of terms for which p-value <= 0.2: 266 ## INFO [2022-01-20 22:51:24]  Enrichment calculation finished for ontology MF. Checked 1354 terms ## INFO [2022-01-20 22:51:24]   Number of terms for which p-value <= 0.01: 3 ## INFO [2022-01-20 22:51:24]   Number of terms for which p-value <= 0.05: 24 ## INFO [2022-01-20 22:51:24]   Number of terms for which p-value <= 0.1: 49 ## INFO [2022-01-20 22:51:24]   Number of terms for which p-value <= 0.2: 71 ## INFO [2022-01-20 22:51:24]  Community 1 ## INFO [2022-01-20 22:52:02]  Enrichment calculation finished for ontology BP. Checked 7116 terms ## INFO [2022-01-20 22:52:02]   Number of terms for which p-value <= 0.01: 6 ## INFO [2022-01-20 22:52:02]   Number of terms for which p-value <= 0.05: 7 ## INFO [2022-01-20 22:52:02]   Number of terms for which p-value <= 0.1: 7 ## INFO [2022-01-20 22:52:02]   Number of terms for which p-value <= 0.2: 7 ## INFO [2022-01-20 22:52:06]  Enrichment calculation finished for ontology MF. Checked 1354 terms ## INFO [2022-01-20 22:52:06]   Number of terms for which p-value <= 0.01: 2 ## INFO [2022-01-20 22:52:06]   Number of terms for which p-value <= 0.05: 3 ## INFO [2022-01-20 22:52:07]   Number of terms for which p-value <= 0.1: 3 ## INFO [2022-01-20 22:52:07]   Number of terms for which p-value <= 0.2: 3 ## INFO [2022-01-20 22:52:07]  Community 5
## Warning in getSigGroups(object, test.stat): No enrichment can pe performed - ## there are no feasible GO terms!
## INFO [2022-01-20 22:52:37]  Enrichment calculation finished for ontology BP. Checked 7116 terms ## INFO [2022-01-20 22:52:37]   Number of terms for which p-value <= 0.01: 0 ## INFO [2022-01-20 22:52:37]   Number of terms for which p-value <= 0.05: 0 ## INFO [2022-01-20 22:52:37]   Number of terms for which p-value <= 0.1: 0 ## INFO [2022-01-20 22:52:37]   Number of terms for which p-value <= 0.2: 0
## Warning in getSigGroups(object, test.stat): No enrichment can pe performed - ## there are no feasible GO terms!
## INFO [2022-01-20 22:52:41]  Enrichment calculation finished for ontology MF. Checked 1354 terms ## INFO [2022-01-20 22:52:41]   Number of terms for which p-value <= 0.01: 0 ## INFO [2022-01-20 22:52:41]   Number of terms for which p-value <= 0.05: 0 ## INFO [2022-01-20 22:52:41]   Number of terms for which p-value <= 0.1: 0 ## INFO [2022-01-20 22:52:41]   Number of terms for which p-value <= 0.2: 0 ## INFO [2022-01-20 22:52:41]  Community 6 ## INFO [2022-01-20 22:53:13]  Enrichment calculation finished for ontology BP. Checked 7116 terms ## INFO [2022-01-20 22:53:13]   Number of terms for which p-value <= 0.01: 5 ## INFO [2022-01-20 22:53:13]   Number of terms for which p-value <= 0.05: 8 ## INFO [2022-01-20 22:53:13]   Number of terms for which p-value <= 0.1: 8 ## INFO [2022-01-20 22:53:13]   Number of terms for which p-value <= 0.2: 8 ## INFO [2022-01-20 22:53:18]  Enrichment calculation finished for ontology MF. Checked 1354 terms ## INFO [2022-01-20 22:53:18]   Number of terms for which p-value <= 0.01: 1 ## INFO [2022-01-20 22:53:18]   Number of terms for which p-value <= 0.05: 2 ## INFO [2022-01-20 22:53:18]   Number of terms for which p-value <= 0.1: 2 ## INFO [2022-01-20 22:53:18]   Number of terms for which p-value <= 0.2: 3 ## INFO [2022-01-20 22:53:18] Results stored in GRN@stats[["Enrichment"]][["byCommunity"]] ## INFO [2022-01-20 22:53:18]  Finished successfully. Execution time: 4.4 mins ## INFO [2022-01-20 22:53:18] Plotting community enrichment results to file /g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIEdev/vignettes/output/plots/GRN.community_enrichment.pdf
## INFO [2022-01-20 22:53:24]  Finished successfully. Execution time: 6.6 secs ## INFO [2022-01-20 22:53:24]  Finished successfully. Execution time: 5.6 mins
 Christian Arnold committed Dec 14, 2021 1000 

For faster browsing, not all history is shown.