Commit 95b75e18 authored by Christian Arnold's avatar Christian Arnold

Version 1.1.6, see Changelog for details

parent afac10fa
Pipeline #6463 failed with stages
in 37 seconds
......@@ -31,6 +31,12 @@ We also put the paper on *bioRxiv*, please read all methodological details here:
Change log
============================
Version 1.1.6 (2018-10-11)
- fixed small issue in ``checkParameterValidity.R`` when not having sufficient permissions for the folder in which the fasta file is located
- updated the ``summaryFinal.R`` script. Now, for the Volcano plot PDF, in addition to adj. p-values, also the raw p-values are plotted in the end. This might be helpful for datasets with small signal when no adj. p-value is significant. In addition, labeling of TFs is now skipped when the number of TFs to label exceeds 150. THis makes the step faster and the PDF smaller and less crowded.
- small updates to the translation table for mm10
- adding two local rules to the Snakefile for potential minor speed improvements when running in cluster mode
Version 1.1.5 (2018-08-14)
- optimized ``checkParameterValidity.R`` script, only TFBS files for TFs included in the analysis are now checked
- addressed an R library compatibility issue independent of *diffTF* that users reported. In some cases, for particular versions of R and Bioconductor, R exited with a *segfault* (memory not mapped) error in the ``checkParameterValidity.R`` that seems to be caused by the combination of *DiffBind* and *DESeq2*. Specifically, when *DiffBind* is loaded *before* *DESeq2*, R crashes with a segmentation fault upon exiting, whereas loading *DiffBind* *after* *DESeq2* causes no issue. If there are further issues, please let us know. Thanks to Gyan Prakash Mishra, who first reported this.
......
......@@ -210,8 +210,7 @@ nTFBS = nrow(overlapsAll.df)
if (nTFBS >= par.l$minNoDatapoints) {
# Group by peak ID
# take only the maximum row mean of all samples, sample with biggest coverage
# Group by peak ID: To avoid biases and dependencies based on TFBS clustering within peaks, we then select the TFBS per TF per peak with the highest average read count across all samples.
coverageAll_grouped.df = overlapsAll.df %>%
dplyr::group_by(peakID) %>%
dplyr::slice(which.max(mean)) %>%
......@@ -395,7 +394,12 @@ if (skipTF) {
}
)
if (class(res_DESeq) == "character") skipTF = TRUE
if (class(res_DESeq) == "character") {
skipTF = TRUE
TF_outputInclPerm.df = as.data.frame(matrix(nrow = 0, ncol = 2 + par.l$nPermutations + 1))
colnames(TF_outputInclPerm.df) = c("TF", "TFBSID", paste0("log2fc_perm", 0:par.l$nPermutations))
}
if (!skipTF) {
res_DESeq.df <- as.data.frame(DESeq2::results(res_DESeq))
......
This diff is collapsed.
......@@ -52,7 +52,7 @@ assertDirectoryExists(dirname(TFBS_dir), access = "r")
fastaFile = snakemake@config$additionalInputFiles$refGenome_fasta
assertFileExists(fastaFile)
assertDirectoryExists(dirname(fastaFile), access = "w")
assertDirectoryExists(dirname(fastaFile), access = "r")
allTFs = strsplit(snakemake@config$par_general$TFs, ",")[[1]]
......
This diff is collapsed.
......@@ -301,7 +301,7 @@ script_summaryFinal = "summaryFinal.R"
#########################################
# For cluster usage: The keyword localrules allows to mark a rule as local, so that it is not submitted to the cluster and instead executed on the host node
localrules: all
localrules: all,cleanUpLogFiles,filterSexChromosomesAndSortPeaks
rule all:
input:
......
......@@ -28,6 +28,10 @@ Cdc5l ENSMUSG00000023932 CDC5L
Cdx1 ENSMUSG00000024619 CDX1
Cdx2 ENSMUSG00000029646 CDX2
Cdx4 ENSMUSG00000031326 CDX4
Cdx4Hoxc9 ENSMUSG00000036139 CDX4HOXC9
Cdx4HT ENSMUSG00000031326 CDX4HT
Cdx4met1 ENSMUSG00000031326 CDX4MET1
Cdx4met2 ENSMUSG00000031326 CDX4MET2
Cebpa ENSMUSG00000034957 CEBPA
Cebpb ENSMUSG00000056501 CEBPB
Cebpd ENSMUSG00000071637 CEBPD
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment