Commit f31efdc3 authored by Christian Arnold's avatar Christian Arnold
Browse files

Version 1.5. See changelog for details.

parent f5fa7903
...@@ -138,7 +138,7 @@ A working ``R`` installation is needed and a number of packages from either CRAN ...@@ -138,7 +138,7 @@ A working ``R`` installation is needed and a number of packages from either CRAN
if (!requireNamespace("BiocManager", quietly = TRUE)) if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager") install.packages("BiocManager")
BiocManager::install(c("limma", "vsn", "csaw", "DESeq2", "DiffBind", "geneplotter", "Rsamtools", "preprocessCore")) BiocManager::install(c("limma", "vsn", "csaw", "DESeq2", "DiffBind", "geneplotter", "Rsamtools", "preprocessCore", "apeglm"))
.. _docs-runOwnAnalysis: .. _docs-runOwnAnalysis:
......
...@@ -55,9 +55,9 @@ author = 'Christian Arnold, Ivan Berest, Judith B. Zaugg' ...@@ -55,9 +55,9 @@ author = 'Christian Arnold, Ivan Berest, Judith B. Zaugg'
# built documents. # built documents.
# #
# The short X.Y version. # The short X.Y version.
version = '1.4' version = '1.5'
# The full version, including alpha/beta/rc tags. # The full version, including alpha/beta/rc tags.
release = '1.4' release = '1.5'
# The language for content autogenerated by Sphinx. Refer to documentation # The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages. # for a list of supported languages.
......
...@@ -2,12 +2,11 @@ ...@@ -2,12 +2,11 @@
Biological motivation Biological motivation
============================ ============================
Transcription factor (TF) activity constitutes an important readout of cellular signalling pathways and thus for assessing regulatory differences across conditions. However, current technologies lack the ability to simultaneously assessing activity changes for multiple TFs and surprisingly little is known about whether a TF acts as repressor or activator. To this end, we introduce the widely applicable genome-wide method diffTF to assess differential TF binding activity and classifying TFs as activator or repressor by integrating any type of genome-wide chromatin with RNA-Seq data and in-silico predicted TF binding sites. Transcription factors (TFs) regulate many cellular processes and can therefore serve as readouts of the signaling and regulatory state. Yet for many TFs, the mode of action—repressing or activating transcription of target genes—is unclear. Here, we present diffTF (`https://git.embl.de/grp-zaugg/diffTF <https://git.embl.de/grp-zaugg/diffTF>`_) to calculate differential TF activity (basic mode) and classify TFs into putative transcriptional activators or repressors (classification mode). In basic mode, it combines genome-wide chromatin accessibility/activity with putative TF binding sites that, in classification mode, are integrated with RNA-seq. We apply diffTF to compare (1) mutated and unmutated chronic lymphocytic leukemia patients and (2) two hematopoietic progenitor cell types. In both datasets, diffTF recovers most known biology and finds many previously unreported TFs. It classifies almost 40% of TFs based on their mode of action, which we validate experimentally. Overall, we demonstrate that diffTF recovers known biology, identifies less well-characterized TFs, and classifies TFs into transcriptional activators or repressors.
For a graphical summary of the idea, see the section :ref:`workflow` For a graphical summary of the idea, see the section :ref:`workflow`
We also put the paper on *bioRxiv*, please see the section :ref:`citation` for details. The paper is open access and available online, please see the section :ref:`citation` for details.
.. _exampleDataset: .. _exampleDataset:
...@@ -44,15 +43,25 @@ Citation ...@@ -44,15 +43,25 @@ Citation
If you use this software, please cite the following reference: If you use this software, please cite the following reference:
Ivan Berest*, Christian Arnold*, Armando Reyes-Palomares, Giovanni Palla, Kasper Dindler Rassmussen, Holly Giles, Peter-Martin Bruch, Sascha Dietrich, Wolfgang Huber, Kristian Helin & Judith B. Zaugg. *Quantification of differential transcription factor activity and multiomics-based classification into activators and repressors: diffTF*. 2019. in review. Ivan Berest*, Christian Arnold*, Armando Reyes-Palomares, Giovanni Palla, Kasper Dindler Rasmussen, Holly Giles, Peter-Martin Bruch, Wolfgang Huber, Sascha Dietrich, Kristian Helin, Judith B. Zaugg. *Quantification of Differential Transcription Factor Activity and Multiomics-Based Classification into Activators and Repressors: diffTF*. 2019. Cell Reports 29(10), P3147-3159.E12.
Open Access. DOI:https://doi.org/10.1016/j.celrep.2019.10.106
We also put the paper on *bioRxiv*, please read all methodological details here:
`Quantification of differential transcription factor activity and multiomic-based classification into activators and repressors: diffTF <https://www.biorxiv.org/content/early/2018/12/01/368498>`_. .. We also put the paper on *bioRxiv*, please read all methodological details here:
.. `Quantification of differential transcription factor activity and multiomic-based classification into activators and repressors: diffTF <https://www.biorxiv.org/content/early/2018/12/01/368498>`_.
.. _changelog: .. _changelog:
Change log Change log
============================ ============================
Version 1.5 (2019-12-03)
- *diffTF* has been published in Cell Reports! See the section :ref:`citation` for details.
- raw and adjusted p-values for the permutation-based approach can now not be 0 anymore. We now use the approach described `here<https://genomicsclass.github.io/book/pages/permutation_tests.html>`_. In a nutshell, the smallest p-value is now 1/(*nPermutations* + 1), with *nPermutations* denoting the number of permutations and thereby depends on the number of permutations - having more permutations makes the minimum p-value smaller.
- for plotting TF densities, a fixed bandwidth of 0.1 was used before. We now removed this and bandwidth is determined automatically. We noticed that in some cases, the fixed bandwidth may lead to a smoothing of the density curves while without fixing it, the densities look more rugged. As we do not want to introduce visual artifacts, we decided to remove it.
- various minor code improvements, particularly related to the AR classification
- small fixes in the Snakefile
Version 1.4 (2019-09-24) Version 1.4 (2019-09-24)
- Various small improvements for increasing user experience - Various small improvements for increasing user experience
......
...@@ -189,13 +189,8 @@ outputSummary.df = tribble(~permutation, ~TF, ~Pos_l2FC, ~Mean_l2FC, ~Median_l2 ...@@ -189,13 +189,8 @@ outputSummary.df = tribble(~permutation, ~TF, ~Pos_l2FC, ~Mean_l2FC, ~Median_l2
# READ OVERLAP FILE # # READ OVERLAP FILE #
##################### #####################
overlapsAll.df = read_tsv(par.l$file_input_peakTFOverlaps, col_names = TRUE, col_types = cols(), comment = "#") overlapsAll.df = read_tidyverse_wrapper(par.l$file_input_peakTFOverlaps, type = "tsv",
col_names = TRUE, col_types = cols(), comment = "#")
if (nrow(problems(overlapsAll.df)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(overlapsAll.df), capture = TRUE)
stop("Error when parsing the file ", fileCur, ", see errors above")
}
if (nrow(overlapsAll.df) > 0) { if (nrow(overlapsAll.df) > 0) {
...@@ -326,11 +321,7 @@ if (skipTF) { ...@@ -326,11 +321,7 @@ if (skipTF) {
# Preallocate data frame so no expensive reallocation has to be done # Preallocate data frame so no expensive reallocation has to be done
log2fc.m = matrix(NA, nrow = nPeaks , ncol = par.l$nPermutations + 1) log2fc.m = matrix(NA, nrow = nPeaks , ncol = par.l$nPermutations + 1)
peaks.df = read_tsv(par.l$file_input_peak2, col_types = cols()) peaks.df = read_tidyverse_wrapper(par.l$file_input_peak2, type = "tsv", col_types = cols())
if (nrow(problems(peaks.df)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(peaks.df), capture = TRUE)
stop("Error when parsing the file ", par.l$file_input_peak2, ", see errors above")
}
peaksFiltered.df = readRDS(par.l$file_input_peaks) peaksFiltered.df = readRDS(par.l$file_input_peaks)
...@@ -344,6 +335,7 @@ if (skipTF) { ...@@ -344,6 +335,7 @@ if (skipTF) {
if (par.l$nPermutations > 0) { if (par.l$nPermutations > 0) {
# Generate normalized counts for limma analysis # Generate normalized counts for limma analysis
countsRaw = DESeq2::counts(TF.cds.filt, norm = FALSE)
countsNorm = DESeq2::counts(TF.cds.filt, norm = TRUE) countsNorm = DESeq2::counts(TF.cds.filt, norm = TRUE)
countsNorm.transf = log2(countsNorm + par.l$pseudocountAddition) countsNorm.transf = log2(countsNorm + par.l$pseudocountAddition)
rownames(countsNorm.transf) = rownames(TF.cds.filt) rownames(countsNorm.transf) = rownames(TF.cds.filt)
...@@ -355,11 +347,12 @@ if (skipTF) { ...@@ -355,11 +347,12 @@ if (skipTF) {
for (permutationCur in 0:par.l$nPermutations) { for (permutationCur in 0:par.l$nPermutations) {
if (permutationCur == 0) { if (permutationCur > 0 & (permutationCur %% 10 == 0 | permutationCur == par.l$nPermutations)) {
flog.info(paste0("Running for real data")) flog.info(paste0("Running permutation ", permutationCur))
} else { } else {
flog.info(paste0("Running for permutation ", permutationCur)) flog.info(paste0("Running for real data "))
} }
sampleData.df = sampleData.l[[paste0("permutation", permutationCur)]] sampleData.df = sampleData.l[[paste0("permutation", permutationCur)]]
############################## ##############################
...@@ -392,14 +385,14 @@ if (skipTF) { ...@@ -392,14 +385,14 @@ if (skipTF) {
# We already set the factors for conditionSummary explicitly. The reference level is the first level for DeSeq. # We already set the factors for conditionSummary explicitly. The reference level is the first level for DeSeq.
# Run the local fit first, if that throws an error try the default fit type # Run the local fit first, if that throws an error try the default fit type
res_DESeq = tryCatch( { TF.cds.filt = tryCatch( {
suppressMessages(DESeq(TF.cds.filt,fitType = 'local')) suppressMessages(DESeq(TF.cds.filt,fitType = 'local'))
}, error = function(e) { }, error = function(e) {
message = "Could not run DESeq with local fitting, retry with default fitting type..." message = "Could not run DESeq with local fitting, retry with default fitting type..."
checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE) checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)
res_DESeq = tryCatch( { TF.cds.filt = tryCatch( {
suppressMessages(DESeq(TF.cds.filt)) suppressMessages(DESeq(TF.cds.filt))
}, error = function(e) { }, error = function(e) {
...@@ -408,12 +401,12 @@ if (skipTF) { ...@@ -408,12 +401,12 @@ if (skipTF) {
} }
) )
res_DESeq TF.cds.filt
} }
) )
if (class(res_DESeq) == "character") { if (class(TF.cds.filt) == "character") {
skipTF = TRUE skipTF = TRUE
TF_outputInclPerm.df = as.data.frame(matrix(nrow = 0, ncol = 2 + par.l$nPermutations + 1)) TF_outputInclPerm.df = as.data.frame(matrix(nrow = 0, ncol = 2 + par.l$nPermutations + 1))
...@@ -425,14 +418,18 @@ if (skipTF) { ...@@ -425,14 +418,18 @@ if (skipTF) {
#Enforce the correct order of the comparison #Enforce the correct order of the comparison
if (comparisonMode == "pairwise") { if (comparisonMode == "pairwise") {
res_DESeq.df <- as.data.frame(DESeq2::results(res_DESeq, contrast = c(variableToPermute, conditionComparison[1], conditionComparison[2]))) contrast = c(variableToPermute, conditionComparison[1], conditionComparison[2])
} else { } else {
# Same as without specifying contrast at all # Same as without specifying contrast at all
res_DESeq.df <- as.data.frame(DESeq2::results(res_DESeq, contrast = list(variableToPermute))) contrast = list(variableToPermute)
} }
res_DESeq = DESeq2::results(TF.cds.filt, contrast = contrast)
res_DESeq.df <- as.data.frame(res_DESeq)
final.TF.df = tibble("TFBSID" = rownames(res_DESeq.df), final.TF.df = tibble("TFBSID" = rownames(res_DESeq.df),
...@@ -475,9 +472,10 @@ if (skipTF) { ...@@ -475,9 +472,10 @@ if (skipTF) {
pdf(par.l$file_output_plot_diagnostic) pdf(par.l$file_output_plot_diagnostic)
if (par.l$nPermutations == 0) { if (par.l$nPermutations == 0) {
plotDiagnosticPlots(TF.cds.filt, res_DESeq, conditionComparison, filename = NULL, maxPairwiseComparisons = 0, plotMA = FALSE) plotDiagnosticPlots(TF.cds.filt, conditionComparison, file = NULL, maxPairwiseComparisons = 0, plotMA = FALSE)
} else { } else {
plotDiagnosticPlots(TF.cds.filt, fit, conditionComparison, filename = NULL, maxPairwiseComparisons = 0, plotMA = FALSE)
plotDiagnosticPlots(fit, conditionComparison, file = NULL, maxPairwiseComparisons = 0, plotMA = FALSE, counts.raw = countsRaw, counts.norm = countsNorm)
} }
......
...@@ -104,7 +104,7 @@ testExistanceAndCreateDirectoriesRecursively(allDirs) ...@@ -104,7 +104,7 @@ testExistanceAndCreateDirectoriesRecursively(allDirs)
TFCur = par.l$TF TFCur = par.l$TF
# TODO: perm.l with all TF instead of TFCur
perm.l = list() perm.l = list()
calculateVariance = par.l$nPermutations == 0 calculateVariance = par.l$nPermutations == 0
...@@ -161,31 +161,23 @@ if (par.l$nPermutations + 1 < length(sampleData.l)) { ...@@ -161,31 +161,23 @@ if (par.l$nPermutations + 1 < length(sampleData.l)) {
# READ NUC CG FILE # # READ NUC CG FILE #
#################### ####################
flog.info(paste0("Reading and processing file ", fileCur)) TF.motifs.CG = read_tidyverse_wrapper(par.l$file_input_nucContentGenome, type = "tsv", col_names = TRUE,
col_types = list(
TF.motifs.CG = read_tsv(par.l$file_input_nucContentGenome, col_names = TRUE, col_skip(), # "chr"
col_types = list( col_skip(), # "MSS"
col_skip(), # "chr" col_skip(), # "MES"
col_skip(), # "MSS" col_character(), # "TFBSID"
col_skip(), # "MES" col_character(), # "TF",
col_character(), # "TFBSID" col_skip(), # "AT"
col_character(), # "TF", col_double(), # "CG"
col_skip(), # "AT" col_skip(), # "A"
col_double(), # "CG" col_skip(), # "C"
col_skip(), # "A" col_skip(), # "G"
col_skip(), # "C" col_skip(), # "T"
col_skip(), # "G" col_skip(), # "N"
col_skip(), # "T" col_skip(), # "other_nucl"
col_skip(), # "N" col_skip() # "length"
col_skip(), # "other_nucl" ))
col_skip() # "length"
)
)
if (nrow(problems(TF.motifs.CG)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(TF.motifs.CG), capture = TRUE)
stop("Error when parsing the file ", fileCur, ", see errors above")
}
colnames(TF.motifs.CG) = c("TFBSID","TF","CG") colnames(TF.motifs.CG) = c("TFBSID","TF","CG")
...@@ -220,39 +212,23 @@ nPermutationsSkipped = 0 ...@@ -220,39 +212,23 @@ nPermutationsSkipped = 0
for (fileCur in par.l$files_input_TF_allMotives) { for (fileCur in par.l$files_input_TF_allMotives) {
# Log 2 fold-changes from the particular permutation # Log 2 fold-changes from the particular permutation
TF.motifs.ori = read_tsv(fileCur, col_names = TRUE, TF.motifs.ori = read_tidyverse_wrapper(fileCur, type = "tsv", col_names = TRUE,
col_types = list( col_types = list(
col_character(), # "TF", col_character(), # "TF",
col_character(), # "TFBSID" col_character(), # "TFBSID"
col_double() # "log2FoldChange", important to have double here as col_number would not parse numbers in scientific notation correctly col_double() # "log2FoldChange", important to have double here as col_number would not parse numbers in scientific notation correctly
) ))
)
if (nrow(problems(TF.motifs.ori)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(TF.motifs.ori), capture = TRUE)
stop("Error when parsing the file ", fileCur, ", see errors above")
}
if (nrow(TF.motifs.ori) == 0) {
message = paste0("The file ", fileCur, " is empty. Something went wrong before. Make sure the previous steps succeeded.")
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
}
if (ncol(TF.motifs.ori) != 3) {
message = paste0("The file ", fileCur, " does not have 3 columns. Something is wrong with the number of permutations. We recommend restarting the pipeline from the DiffPeaks step.")
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
}
colnames(TF.motifs.ori) = c("TF", "TFBSID", "log2FoldChange") colnames(TF.motifs.ori) = c("TF", "TFBSID", "log2FoldChange")
permutationCur = as.numeric(gsub(".*perm([0-9]+).tsv.gz", '\\1', fileCur)) permutationCur = as.numeric(gsub(".*perm([0-9]+).tsv.gz", '\\1', fileCur))
TF.motifs.ori$permutation = permutationCur TF.motifs.ori$permutation = permutationCur
if (permutationCur > 0) { if (permutationCur > 0 & (permutationCur %% 10 == 0 | permutationCur == par.l$nPermutations)) {
flog.info(paste0("Running permutation ", permutationCur)) flog.info(paste0("Running permutation ", permutationCur))
} else { } else {
flog.info(paste0("Running for real data ", permutationCur)) flog.info(paste0("Running for real data "))
} }
...@@ -289,8 +265,6 @@ for (fileCur in par.l$files_input_TF_allMotives) { ...@@ -289,8 +265,6 @@ for (fileCur in par.l$files_input_TF_allMotives) {
nBinsAll = length(levels(TF.motifs.all$CG.bins)) nBinsAll = length(levels(TF.motifs.all$CG.bins))
nCol = ncol(perm.l[[TFCur]]) nCol = ncol(perm.l[[TFCur]])
# TODO: TFBSID needed?
flog.info(paste0(" Found ", nrow(TF.motifs.all) - nrow(TF.motifs.all.unique), " duplicated TFBS across all TF.")) flog.info(paste0(" Found ", nrow(TF.motifs.all) - nrow(TF.motifs.all.unique), " duplicated TFBS across all TF."))
# TODO: Optimize as in dev TF.motifs.all.unique = TF.motifs.all.unique[which(TF.motifs.all.unique$TF != TFCur & is.finite(TF.motifs.all.unique$log2FoldChange)),] # TODO: Optimize as in dev TF.motifs.all.unique = TF.motifs.all.unique[which(TF.motifs.all.unique$TF != TFCur & is.finite(TF.motifs.all.unique$log2FoldChange)),]
......
This diff is collapsed.
...@@ -134,8 +134,7 @@ if (par.l$nPermutations == 0 && par.l$nBootstraps < 1000) { ...@@ -134,8 +134,7 @@ if (par.l$nPermutations == 0 && par.l$nBootstraps < 1000) {
# CHECK FASTA AND BAM FILES # # CHECK FASTA AND BAM FILES #
############################# #############################
sampleData.df = read_tsv(par.l$file_input_sampleData, col_names = TRUE, col_types = cols()) sampleData.df = read_tidyverse_wrapper(par.l$file_input_sampleData, type = "tsv", col_names = TRUE, col_types = cols())
components3types = checkDesignIntegrity(snakemake, par.l, sampleData.df)$types components3types = checkDesignIntegrity(snakemake, par.l, sampleData.df)$types
...@@ -244,12 +243,8 @@ flog.info(paste0("Check peak files...")) ...@@ -244,12 +243,8 @@ flog.info(paste0("Check peak files..."))
if (file_peaks != "") { if (file_peaks != "") {
assertFileExists(snakemake@config$peaks$consensusPeaks) assertFileExists(snakemake@config$peaks$consensusPeaks)
peaks.df = read_tsv(snakemake@config$peaks$consensusPeaks, col_names = FALSE) peaks.df = read_tidyverse_wrapper(snakemake@config$peaks$consensusPeaks, type = "tsv", col_names = FALSE)
if (nrow(problems(peaks.df)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(peaks.df), capture = TRUE)
stop("Parsing errors with file ", snakemake@config$peaks$consensusPeaks, ". See the log file for more information")
}
flog.info(paste0("Peak file contains ", nrow(peaks.df), " peaks.")) flog.info(paste0("Peak file contains ", nrow(peaks.df), " peaks."))
if (nrow(peaks.df) > 100000) { if (nrow(peaks.df) > 100000) {
...@@ -343,20 +338,8 @@ for (TFCur in allTFs) { ...@@ -343,20 +338,8 @@ for (TFCur in allTFs) {
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
} }
tableCur.df = read_tsv(fileCur, col_names = FALSE, tableCur.df = read_tidyverse_wrapper(fileCur, type = "tsv", ncolExpected = 6, col_names = FALSE, col_types = "ciicnc")
col_types = "ciicnc")
if (nrow(problems(tableCur.df)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(tableCur.df), capture = TRUE)
stop("Parsing errors for file ", TFCur, ". See the log file for more information")
}
ncols = ncol(tableCur.df)
if (ncols != 6) {
message = paste0("Number of columns must be 6, but file ", TFCur, " has ", ncols, " instead")
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
}
assertIntegerish(tableCur.df$X2, lower = 0) assertIntegerish(tableCur.df$X2, lower = 0)
assertIntegerish(tableCur.df$X3, lower = 0) assertIntegerish(tableCur.df$X3, lower = 0)
......
...@@ -90,7 +90,8 @@ printParametersLog(par.l) ...@@ -90,7 +90,8 @@ printParametersLog(par.l)
########################## ##########################
# Provide the metadata file and parse the CSV here # Provide the metadata file and parse the CSV here
sampleMetaData.df = read_tsv(file_sampleData, col_types = cols()) sampleMetaData.df = read_tidyverse_wrapper(file_sampleData, type = "tsv",
col_types = cols())
assertSubset(c("SampleID", "bamReads", "conditionSummary", "Peaks"), colnames(sampleMetaData.df)) assertSubset(c("SampleID", "bamReads", "conditionSummary", "Peaks"), colnames(sampleMetaData.df))
assertIntegerish(minOverlap, upper = nrow(sampleMetaData.df)) assertIntegerish(minOverlap, upper = nrow(sampleMetaData.df))
......
...@@ -119,7 +119,8 @@ printParametersLog(par.l) ...@@ -119,7 +119,8 @@ printParametersLog(par.l)
# READ METADATA # # READ METADATA #
################# #################
sampleData.df = read_tsv(par.l$file_input_sampleData, col_names = TRUE, col_types = cols()) sampleData.df = read_tidyverse_wrapper(par.l$file_input_sampleData, type = "tsv",
col_names = TRUE, col_types = cols())
checkAndLogWarningsAndErrors(colnames(sampleData.df), checkSubset(c("bamReads"), colnames(sampleData.df))) checkAndLogWarningsAndErrors(colnames(sampleData.df), checkSubset(c("bamReads"), colnames(sampleData.df)))
...@@ -169,18 +170,8 @@ if (comparisonMode == "pairwise") { ...@@ -169,18 +170,8 @@ if (comparisonMode == "pairwise") {
# ITERATE THROUGH PEAK FILES # # ITERATE THROUGH PEAK FILES #
############################## ##############################
coverageAll.df = read_tsv(par.l$file_input_peakOverlaps, col_names = TRUE, comment = "#", col_types = cols()) coverageAll.df = read_tidyverse_wrapper(par.l$file_input_peakOverlaps, type = "tsv",
col_names = TRUE, comment = "#", col_types = cols())
if (nrow(problems(coverageAll.df)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(coverageAll.df), capture = TRUE)
stop("Error when parsing the file ", fileCur, ", see errors above")
}
if (nrow(coverageAll.df) == 0) {
message = paste0("Empty file ", par.l$file_input_peaks, ".")
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
}
## transform as matrix data frame with counts ## transform as matrix data frame with counts
coverageAll.m = as.matrix(dplyr::select(coverageAll.df, -one_of("Geneid", "Chr", "Start", "End", "Strand", "Length"))) coverageAll.m = as.matrix(dplyr::select(coverageAll.df, -one_of("Geneid", "Chr", "Start", "End", "Strand", "Length")))
...@@ -399,66 +390,30 @@ countsNorm.df = as.data.frame(countsNorm) %>% ...@@ -399,66 +390,30 @@ countsNorm.df = as.data.frame(countsNorm) %>%
dplyr::select(one_of("peakID", colnames(countsNorm))) dplyr::select(one_of("peakID", colnames(countsNorm)))
if (par.l$nPermutations == 0) { # Deseq analysis
cds.peaks.filt = tryCatch( {
# Generate normalized counts for limma analysis
countsNorm.transf = log2(countsNorm + par.l$pseudocountAddition)
rownames(countsNorm.transf) = rownames(cds.peaks.filt)
sampleData.df$conditionSummary = factor(sampleData.df$conditionSummary)
designMatrix = model.matrix(designFormula, data = sampleData.df)
if (nrow(designMatrix) < nrow(sampleData.df)) {
missingRows = setdiff(1:nrow(sampleData.df), as.integer(row.names(designMatrix)))
message = paste0("There is a problem with the specified design formula (parameter designContrast): The corresponding design matrix has fewer rows. This usually means that there are missing values in one of the specified variables. The problem comes from the following lines in the summary file: ", paste0(missingRows, collapse = ","), ".")
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
}
fit <- eBayes(lmFit(countsNorm.transf, design = designMatrix))
results.df <- topTable(fit, coef = colnames(fit$design)[ncol(fit$design)], number = Inf, sort.by = "none")
final.peaks.df = tibble(
"permutation" = 0,
"peakID" = rownames(results.df),
"limma_avgExpr" = results.df$AveExpr,
"l2FC" = results.df$logFC,
"limma_B" = results.df$B,
"limma_t_stat" = results.df$t,
"pval" = results.df$P.Value,
"pval_adj" = results.df$adj.P.Val
)
plotDiagnosticPlots(cds.peaks.filt, fit, comparisonDESeq, par.l$file_output_plots, maxPairwiseComparisons = 0, plotMA = TRUE)
} else {
# Deseq analysis
cds.peaks.filt = tryCatch( {
DESeq(cds.peaks.filt, fitType = 'local', quiet = TRUE) DESeq(cds.peaks.filt, fitType = 'local', quiet = TRUE)
}, error = function(e) { }, error = function(e) {
message = "Warning: Could not run DESeq with local fitting, retry with default fitting type..." message = "Warning: Could not run DESeq with local fitting, retry with default fitting type..."
checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE) checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)
DESeq(cds.peaks.filt, quiet = TRUE) DESeq(cds.peaks.filt, quiet = TRUE)
} }
) )
#Enforce the correct order of the comparison
#Enforce the correct order of the comparison if (comparisonMode == "pairwise") {
if (comparisonMode == "pairwise") {
cds.peaks.df <- as.data.frame(DESeq2::results(cds.peaks.filt, contrast = c(variableToPermute, comparisonDESeq[1], comparisonDESeq[2])))
} else {
# Same as without specifying contrast at all
cds.peaks.df <- as.data.frame(DESeq2::results(cds.peaks.filt, contrast = list(variableToPermute)))
}