Commit f31efdc3 authored by Christian Arnold's avatar Christian Arnold

Version 1.5. See changelog for details.

parent f5fa7903
......@@ -138,7 +138,7 @@ A working ``R`` installation is needed and a number of packages from either CRAN
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("limma", "vsn", "csaw", "DESeq2", "DiffBind", "geneplotter", "Rsamtools", "preprocessCore"))
BiocManager::install(c("limma", "vsn", "csaw", "DESeq2", "DiffBind", "geneplotter", "Rsamtools", "preprocessCore", "apeglm"))
.. _docs-runOwnAnalysis:
......
......@@ -55,9 +55,9 @@ author = 'Christian Arnold, Ivan Berest, Judith B. Zaugg'
# built documents.
#
# The short X.Y version.
version = '1.4'
version = '1.5'
# The full version, including alpha/beta/rc tags.
release = '1.4'
release = '1.5'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
......
......@@ -2,12 +2,11 @@
Biological motivation
============================
Transcription factor (TF) activity constitutes an important readout of cellular signalling pathways and thus for assessing regulatory differences across conditions. However, current technologies lack the ability to simultaneously assessing activity changes for multiple TFs and surprisingly little is known about whether a TF acts as repressor or activator. To this end, we introduce the widely applicable genome-wide method diffTF to assess differential TF binding activity and classifying TFs as activator or repressor by integrating any type of genome-wide chromatin with RNA-Seq data and in-silico predicted TF binding sites.
Transcription factors (TFs) regulate many cellular processes and can therefore serve as readouts of the signaling and regulatory state. Yet for many TFs, the mode of action—repressing or activating transcription of target genes—is unclear. Here, we present diffTF (`https://git.embl.de/grp-zaugg/diffTF <https://git.embl.de/grp-zaugg/diffTF>`_) to calculate differential TF activity (basic mode) and classify TFs into putative transcriptional activators or repressors (classification mode). In basic mode, it combines genome-wide chromatin accessibility/activity with putative TF binding sites that, in classification mode, are integrated with RNA-seq. We apply diffTF to compare (1) mutated and unmutated chronic lymphocytic leukemia patients and (2) two hematopoietic progenitor cell types. In both datasets, diffTF recovers most known biology and finds many previously unreported TFs. It classifies almost 40% of TFs based on their mode of action, which we validate experimentally. Overall, we demonstrate that diffTF recovers known biology, identifies less well-characterized TFs, and classifies TFs into transcriptional activators or repressors.
For a graphical summary of the idea, see the section :ref:`workflow`
We also put the paper on *bioRxiv*, please see the section :ref:`citation` for details.
The paper is open access and available online, please see the section :ref:`citation` for details.
.. _exampleDataset:
......@@ -44,15 +43,25 @@ Citation
If you use this software, please cite the following reference:
Ivan Berest*, Christian Arnold*, Armando Reyes-Palomares, Giovanni Palla, Kasper Dindler Rassmussen, Holly Giles, Peter-Martin Bruch, Sascha Dietrich, Wolfgang Huber, Kristian Helin & Judith B. Zaugg. *Quantification of differential transcription factor activity and multiomics-based classification into activators and repressors: diffTF*. 2019. in review.
Ivan Berest*, Christian Arnold*, Armando Reyes-Palomares, Giovanni Palla, Kasper Dindler Rasmussen, Holly Giles, Peter-Martin Bruch, Wolfgang Huber, Sascha Dietrich, Kristian Helin, Judith B. Zaugg. *Quantification of Differential Transcription Factor Activity and Multiomics-Based Classification into Activators and Repressors: diffTF*. 2019. Cell Reports 29(10), P3147-3159.E12.
Open Access. DOI:https://doi.org/10.1016/j.celrep.2019.10.106
We also put the paper on *bioRxiv*, please read all methodological details here:
`Quantification of differential transcription factor activity and multiomic-based classification into activators and repressors: diffTF <https://www.biorxiv.org/content/early/2018/12/01/368498>`_.
.. We also put the paper on *bioRxiv*, please read all methodological details here:
.. `Quantification of differential transcription factor activity and multiomic-based classification into activators and repressors: diffTF <https://www.biorxiv.org/content/early/2018/12/01/368498>`_.
.. _changelog:
Change log
============================
Version 1.5 (2019-12-03)
- *diffTF* has been published in Cell Reports! See the section :ref:`citation` for details.
- raw and adjusted p-values for the permutation-based approach can now not be 0 anymore. We now use the approach described `here<https://genomicsclass.github.io/book/pages/permutation_tests.html>`_. In a nutshell, the smallest p-value is now 1/(*nPermutations* + 1), with *nPermutations* denoting the number of permutations and thereby depends on the number of permutations - having more permutations makes the minimum p-value smaller.
- for plotting TF densities, a fixed bandwidth of 0.1 was used before. We now removed this and bandwidth is determined automatically. We noticed that in some cases, the fixed bandwidth may lead to a smoothing of the density curves while without fixing it, the densities look more rugged. As we do not want to introduce visual artifacts, we decided to remove it.
- various minor code improvements, particularly related to the AR classification
- small fixes in the Snakefile
Version 1.4 (2019-09-24)
- Various small improvements for increasing user experience
......
......@@ -189,13 +189,8 @@ outputSummary.df = tribble(~permutation, ~TF, ~Pos_l2FC, ~Mean_l2FC, ~Median_l2
# READ OVERLAP FILE #
#####################
overlapsAll.df = read_tsv(par.l$file_input_peakTFOverlaps, col_names = TRUE, col_types = cols(), comment = "#")
if (nrow(problems(overlapsAll.df)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(overlapsAll.df), capture = TRUE)
stop("Error when parsing the file ", fileCur, ", see errors above")
}
overlapsAll.df = read_tidyverse_wrapper(par.l$file_input_peakTFOverlaps, type = "tsv",
col_names = TRUE, col_types = cols(), comment = "#")
if (nrow(overlapsAll.df) > 0) {
......@@ -326,11 +321,7 @@ if (skipTF) {
# Preallocate data frame so no expensive reallocation has to be done
log2fc.m = matrix(NA, nrow = nPeaks , ncol = par.l$nPermutations + 1)
peaks.df = read_tsv(par.l$file_input_peak2, col_types = cols())
if (nrow(problems(peaks.df)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(peaks.df), capture = TRUE)
stop("Error when parsing the file ", par.l$file_input_peak2, ", see errors above")
}
peaks.df = read_tidyverse_wrapper(par.l$file_input_peak2, type = "tsv", col_types = cols())
peaksFiltered.df = readRDS(par.l$file_input_peaks)
......@@ -344,6 +335,7 @@ if (skipTF) {
if (par.l$nPermutations > 0) {
# Generate normalized counts for limma analysis
countsRaw = DESeq2::counts(TF.cds.filt, norm = FALSE)
countsNorm = DESeq2::counts(TF.cds.filt, norm = TRUE)
countsNorm.transf = log2(countsNorm + par.l$pseudocountAddition)
rownames(countsNorm.transf) = rownames(TF.cds.filt)
......@@ -355,11 +347,12 @@ if (skipTF) {
for (permutationCur in 0:par.l$nPermutations) {
if (permutationCur == 0) {
flog.info(paste0("Running for real data"))
} else {
flog.info(paste0("Running for permutation ", permutationCur))
}
if (permutationCur > 0 & (permutationCur %% 10 == 0 | permutationCur == par.l$nPermutations)) {
flog.info(paste0("Running permutation ", permutationCur))
} else {
flog.info(paste0("Running for real data "))
}
sampleData.df = sampleData.l[[paste0("permutation", permutationCur)]]
##############################
......@@ -392,14 +385,14 @@ if (skipTF) {
# We already set the factors for conditionSummary explicitly. The reference level is the first level for DeSeq.
# Run the local fit first, if that throws an error try the default fit type
res_DESeq = tryCatch( {
TF.cds.filt = tryCatch( {
suppressMessages(DESeq(TF.cds.filt,fitType = 'local'))
}, error = function(e) {
message = "Could not run DESeq with local fitting, retry with default fitting type..."
checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)
res_DESeq = tryCatch( {
TF.cds.filt = tryCatch( {
suppressMessages(DESeq(TF.cds.filt))
}, error = function(e) {
......@@ -408,12 +401,12 @@ if (skipTF) {
}
)
res_DESeq
TF.cds.filt
}
)
if (class(res_DESeq) == "character") {
if (class(TF.cds.filt) == "character") {
skipTF = TRUE
TF_outputInclPerm.df = as.data.frame(matrix(nrow = 0, ncol = 2 + par.l$nPermutations + 1))
......@@ -425,14 +418,18 @@ if (skipTF) {
#Enforce the correct order of the comparison
if (comparisonMode == "pairwise") {
res_DESeq.df <- as.data.frame(DESeq2::results(res_DESeq, contrast = c(variableToPermute, conditionComparison[1], conditionComparison[2])))
contrast = c(variableToPermute, conditionComparison[1], conditionComparison[2])
} else {
# Same as without specifying contrast at all
res_DESeq.df <- as.data.frame(DESeq2::results(res_DESeq, contrast = list(variableToPermute)))
contrast = list(variableToPermute)
}
res_DESeq = DESeq2::results(TF.cds.filt, contrast = contrast)
res_DESeq.df <- as.data.frame(res_DESeq)
final.TF.df = tibble("TFBSID" = rownames(res_DESeq.df),
......@@ -475,9 +472,10 @@ if (skipTF) {
pdf(par.l$file_output_plot_diagnostic)
if (par.l$nPermutations == 0) {
plotDiagnosticPlots(TF.cds.filt, res_DESeq, conditionComparison, filename = NULL, maxPairwiseComparisons = 0, plotMA = FALSE)
plotDiagnosticPlots(TF.cds.filt, conditionComparison, file = NULL, maxPairwiseComparisons = 0, plotMA = FALSE)
} else {
plotDiagnosticPlots(TF.cds.filt, fit, conditionComparison, filename = NULL, maxPairwiseComparisons = 0, plotMA = FALSE)
plotDiagnosticPlots(fit, conditionComparison, file = NULL, maxPairwiseComparisons = 0, plotMA = FALSE, counts.raw = countsRaw, counts.norm = countsNorm)
}
......
......@@ -104,7 +104,7 @@ testExistanceAndCreateDirectoriesRecursively(allDirs)
TFCur = par.l$TF
# TODO: perm.l with all TF instead of TFCur
perm.l = list()
calculateVariance = par.l$nPermutations == 0
......@@ -161,31 +161,23 @@ if (par.l$nPermutations + 1 < length(sampleData.l)) {
# READ NUC CG FILE #
####################
flog.info(paste0("Reading and processing file ", fileCur))
TF.motifs.CG = read_tsv(par.l$file_input_nucContentGenome, col_names = TRUE,
col_types = list(
col_skip(), # "chr"
col_skip(), # "MSS"
col_skip(), # "MES"
col_character(), # "TFBSID"
col_character(), # "TF",
col_skip(), # "AT"
col_double(), # "CG"
col_skip(), # "A"
col_skip(), # "C"
col_skip(), # "G"
col_skip(), # "T"
col_skip(), # "N"
col_skip(), # "other_nucl"
col_skip() # "length"
)
)
if (nrow(problems(TF.motifs.CG)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(TF.motifs.CG), capture = TRUE)
stop("Error when parsing the file ", fileCur, ", see errors above")
}
TF.motifs.CG = read_tidyverse_wrapper(par.l$file_input_nucContentGenome, type = "tsv", col_names = TRUE,
col_types = list(
col_skip(), # "chr"
col_skip(), # "MSS"
col_skip(), # "MES"
col_character(), # "TFBSID"
col_character(), # "TF",
col_skip(), # "AT"
col_double(), # "CG"
col_skip(), # "A"
col_skip(), # "C"
col_skip(), # "G"
col_skip(), # "T"
col_skip(), # "N"
col_skip(), # "other_nucl"
col_skip() # "length"
))
colnames(TF.motifs.CG) = c("TFBSID","TF","CG")
......@@ -220,39 +212,23 @@ nPermutationsSkipped = 0
for (fileCur in par.l$files_input_TF_allMotives) {
# Log 2 fold-changes from the particular permutation
TF.motifs.ori = read_tsv(fileCur, col_names = TRUE,
col_types = list(
col_character(), # "TF",
col_character(), # "TFBSID"
col_double() # "log2FoldChange", important to have double here as col_number would not parse numbers in scientific notation correctly
)
)
TF.motifs.ori = read_tidyverse_wrapper(fileCur, type = "tsv", col_names = TRUE,
col_types = list(
col_character(), # "TF",
col_character(), # "TFBSID"
col_double() # "log2FoldChange", important to have double here as col_number would not parse numbers in scientific notation correctly
))
if (nrow(problems(TF.motifs.ori)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(TF.motifs.ori), capture = TRUE)
stop("Error when parsing the file ", fileCur, ", see errors above")
}
if (nrow(TF.motifs.ori) == 0) {
message = paste0("The file ", fileCur, " is empty. Something went wrong before. Make sure the previous steps succeeded.")
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
}
if (ncol(TF.motifs.ori) != 3) {
message = paste0("The file ", fileCur, " does not have 3 columns. Something is wrong with the number of permutations. We recommend restarting the pipeline from the DiffPeaks step.")
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
}
colnames(TF.motifs.ori) = c("TF", "TFBSID", "log2FoldChange")
permutationCur = as.numeric(gsub(".*perm([0-9]+).tsv.gz", '\\1', fileCur))
TF.motifs.ori$permutation = permutationCur
if (permutationCur > 0) {
if (permutationCur > 0 & (permutationCur %% 10 == 0 | permutationCur == par.l$nPermutations)) {
flog.info(paste0("Running permutation ", permutationCur))
} else {
flog.info(paste0("Running for real data ", permutationCur))
flog.info(paste0("Running for real data "))
}
......@@ -289,8 +265,6 @@ for (fileCur in par.l$files_input_TF_allMotives) {
nBinsAll = length(levels(TF.motifs.all$CG.bins))
nCol = ncol(perm.l[[TFCur]])
# TODO: TFBSID needed?
flog.info(paste0(" Found ", nrow(TF.motifs.all) - nrow(TF.motifs.all.unique), " duplicated TFBS across all TF."))
# TODO: Optimize as in dev TF.motifs.all.unique = TF.motifs.all.unique[which(TF.motifs.all.unique$TF != TFCur & is.finite(TF.motifs.all.unique$log2FoldChange)),]
......
This diff is collapsed.
......@@ -134,8 +134,7 @@ if (par.l$nPermutations == 0 && par.l$nBootstraps < 1000) {
# CHECK FASTA AND BAM FILES #
#############################
sampleData.df = read_tsv(par.l$file_input_sampleData, col_names = TRUE, col_types = cols())
sampleData.df = read_tidyverse_wrapper(par.l$file_input_sampleData, type = "tsv", col_names = TRUE, col_types = cols())
components3types = checkDesignIntegrity(snakemake, par.l, sampleData.df)$types
......@@ -244,12 +243,8 @@ flog.info(paste0("Check peak files..."))
if (file_peaks != "") {
assertFileExists(snakemake@config$peaks$consensusPeaks)
peaks.df = read_tsv(snakemake@config$peaks$consensusPeaks, col_names = FALSE)
if (nrow(problems(peaks.df)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(peaks.df), capture = TRUE)
stop("Parsing errors with file ", snakemake@config$peaks$consensusPeaks, ". See the log file for more information")
}
peaks.df = read_tidyverse_wrapper(snakemake@config$peaks$consensusPeaks, type = "tsv", col_names = FALSE)
flog.info(paste0("Peak file contains ", nrow(peaks.df), " peaks."))
if (nrow(peaks.df) > 100000) {
......@@ -343,20 +338,8 @@ for (TFCur in allTFs) {
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
}
tableCur.df = read_tsv(fileCur, col_names = FALSE,
col_types = "ciicnc")
if (nrow(problems(tableCur.df)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(tableCur.df), capture = TRUE)
stop("Parsing errors for file ", TFCur, ". See the log file for more information")
}
ncols = ncol(tableCur.df)
if (ncols != 6) {
message = paste0("Number of columns must be 6, but file ", TFCur, " has ", ncols, " instead")
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
}
tableCur.df = read_tidyverse_wrapper(fileCur, type = "tsv", ncolExpected = 6, col_names = FALSE, col_types = "ciicnc")
assertIntegerish(tableCur.df$X2, lower = 0)
assertIntegerish(tableCur.df$X3, lower = 0)
......
......@@ -90,7 +90,8 @@ printParametersLog(par.l)
##########################
# Provide the metadata file and parse the CSV here
sampleMetaData.df = read_tsv(file_sampleData, col_types = cols())
sampleMetaData.df = read_tidyverse_wrapper(file_sampleData, type = "tsv",
col_types = cols())
assertSubset(c("SampleID", "bamReads", "conditionSummary", "Peaks"), colnames(sampleMetaData.df))
assertIntegerish(minOverlap, upper = nrow(sampleMetaData.df))
......
......@@ -119,7 +119,8 @@ printParametersLog(par.l)
# READ METADATA #
#################
sampleData.df = read_tsv(par.l$file_input_sampleData, col_names = TRUE, col_types = cols())
sampleData.df = read_tidyverse_wrapper(par.l$file_input_sampleData, type = "tsv",
col_names = TRUE, col_types = cols())
checkAndLogWarningsAndErrors(colnames(sampleData.df), checkSubset(c("bamReads"), colnames(sampleData.df)))
......@@ -169,18 +170,8 @@ if (comparisonMode == "pairwise") {
# ITERATE THROUGH PEAK FILES #
##############################
coverageAll.df = read_tsv(par.l$file_input_peakOverlaps, col_names = TRUE, comment = "#", col_types = cols())
if (nrow(problems(coverageAll.df)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(coverageAll.df), capture = TRUE)
stop("Error when parsing the file ", fileCur, ", see errors above")
}
if (nrow(coverageAll.df) == 0) {
message = paste0("Empty file ", par.l$file_input_peaks, ".")
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
}
coverageAll.df = read_tidyverse_wrapper(par.l$file_input_peakOverlaps, type = "tsv",
col_names = TRUE, comment = "#", col_types = cols())
## transform as matrix data frame with counts
coverageAll.m = as.matrix(dplyr::select(coverageAll.df, -one_of("Geneid", "Chr", "Start", "End", "Strand", "Length")))
......@@ -399,66 +390,30 @@ countsNorm.df = as.data.frame(countsNorm) %>%
dplyr::select(one_of("peakID", colnames(countsNorm)))
if (par.l$nPermutations == 0) {
# Generate normalized counts for limma analysis
countsNorm.transf = log2(countsNorm + par.l$pseudocountAddition)
rownames(countsNorm.transf) = rownames(cds.peaks.filt)
sampleData.df$conditionSummary = factor(sampleData.df$conditionSummary)
designMatrix = model.matrix(designFormula, data = sampleData.df)
if (nrow(designMatrix) < nrow(sampleData.df)) {
missingRows = setdiff(1:nrow(sampleData.df), as.integer(row.names(designMatrix)))
message = paste0("There is a problem with the specified design formula (parameter designContrast): The corresponding design matrix has fewer rows. This usually means that there are missing values in one of the specified variables. The problem comes from the following lines in the summary file: ", paste0(missingRows, collapse = ","), ".")
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
}
fit <- eBayes(lmFit(countsNorm.transf, design = designMatrix))
results.df <- topTable(fit, coef = colnames(fit$design)[ncol(fit$design)], number = Inf, sort.by = "none")
final.peaks.df = tibble(
"permutation" = 0,
"peakID" = rownames(results.df),
"limma_avgExpr" = results.df$AveExpr,
"l2FC" = results.df$logFC,
"limma_B" = results.df$B,
"limma_t_stat" = results.df$t,
"pval" = results.df$P.Value,
"pval_adj" = results.df$adj.P.Val
)
plotDiagnosticPlots(cds.peaks.filt, fit, comparisonDESeq, par.l$file_output_plots, maxPairwiseComparisons = 0, plotMA = TRUE)
} else {
# Deseq analysis
cds.peaks.filt = tryCatch( {
# Deseq analysis
cds.peaks.filt = tryCatch( {
DESeq(cds.peaks.filt, fitType = 'local', quiet = TRUE)
}, error = function(e) {
}, error = function(e) {
message = "Warning: Could not run DESeq with local fitting, retry with default fitting type..."
checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)
DESeq(cds.peaks.filt, quiet = TRUE)
}
)
#Enforce the correct order of the comparison
if (comparisonMode == "pairwise") {
cds.peaks.df <- as.data.frame(DESeq2::results(cds.peaks.filt, contrast = c(variableToPermute, comparisonDESeq[1], comparisonDESeq[2])))
} else {
# Same as without specifying contrast at all
cds.peaks.df <- as.data.frame(DESeq2::results(cds.peaks.filt, contrast = list(variableToPermute)))
}
}
)
#Enforce the correct order of the comparison
if (comparisonMode == "pairwise") {
final.peaks.df = tibble(
contrast = c(variableToPermute, comparisonDESeq[1], comparisonDESeq[2])
} else {
# Same as without specifying contrast at all
contrast = list(variableToPermute)
}
cds.peaks.df <- as.data.frame(DESeq2::results(cds.peaks.filt, contrast = contrast))
final.peaks.df = tibble(
"permutation" = 0,
"peakID" = rownames(cds.peaks.df),
"DESeq_baseMean" = cds.peaks.df$baseMean,
......@@ -467,11 +422,9 @@ if (par.l$nPermutations == 0) {
"DESeq_stat" = cds.peaks.df$stat,
"pval" = cds.peaks.df$pvalue,
"pval_adj" = cds.peaks.df$padj
)
plotDiagnosticPlots(cds.peaks.filt, cds.peaks.filt, comparisonDESeq, par.l$file_output_plots, maxPairwiseComparisons = 20)
}
)
plotDiagnosticPlots(cds.peaks.filt, comparisonDESeq, file = par.l$file_output_plots, contrast = contrast, maxPairwiseComparisons = 20)
saveRDS(cds.peaks.filt, file = par.l$file_output_DESeqObj)
......@@ -504,7 +457,6 @@ if (par.l$nPermutations > 0) {
################
# WRITE OUTPUT #
################
......
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -96,8 +96,7 @@ for (fileCur in par.l$files_input_TF_summary) {
}
peaks.df = read_tsv(par.l$file_input_peaks, col_types = cols())
peaks.df = read_tidyverse_wrapper(par.l$file_input_peaks, type = "tsv", col_types = cols())
summary.df = NULL
......
This diff is collapsed.
......@@ -198,6 +198,8 @@ else:
allBamFilesBasename = []
for fileCur in allBamFiles:
if fileCur.endswith(" ") or fileCur.startswith(" "):
raise IOError("File \"" + fileCur + "\" starts or ends with a white space, this should be corrected in the file " + config["samples"]["summaryFile"] + ".")
if not os.path.isfile(fileCur):
raise IOError("File \"" + fileCur + "\" (defined in " + config["samples"]["summaryFile"] + ") not found.")
# Index is not needed for featureCounts
......@@ -258,7 +260,11 @@ if config["peaks"]["consensusPeaks"]:
raise IOError("File \"" + filenameCur + "\" not found.")
# Check if it contains scientific notification
# the || true in the end ensures the exit status of grep is 0, because this would raise an error otherwise
nHits = int(subprocess.check_output('grep -c "e+" ' + config["peaks"]["consensusPeaks"] + " || true", shell=True))
grep_variant = "grep"
if filenameCur.endswith("gz"):
grep_variant = "zgrep"
nHits = int(subprocess.check_output(grep_variant + ' -c "e+" ' + config["peaks"]["consensusPeaks"] + " || true", shell=True))
if nHits > 0:
raise AssertionError("File " + config["peaks"]["consensusPeaks"] + " contains at least one line with the scientific notation (e+). This will cause errors in subsequent steps. Check the file and transform all \"e+\" coordinates.")
......@@ -463,7 +469,7 @@ rule intersectPeaksAndTFBS:
TFBSinPeaksMod_bed = expand('{dir}/{compType}allTFBS.peaks.extension.bed.gz', dir = TEMP_EXTENSION_DIR, compType = compType)
message: "{ruleDisplayMessage} Obtain binding sites from peaks: Intersect all TFBS files and {input.consensusPeaks}..."
threads: 1
#singularity: "shub://chrarnold/Singularity_images:difftf_conda"
singularity: "shub://chrarnold/Singularity_images:difftf_conda"
params:
extension = config["par_general"]["regionExtension"],
ulimitMax = ulimitMax
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment