Commit 1abe6265 authored by Christian Arnold's avatar Christian Arnold

Version 1.1.5, see Changelog

parent a43fd127
......@@ -987,7 +987,10 @@ We here provide a list of some of the errors that can happen and that users repo
Segmentation fault
...
This unfortunate message points to a problem with your R and R libraries installation and has per se nothing to do with *diffTF*. At least one of the installed libraries has an issue. We advise to reinstall *Bioconductor* in such a case, and ask someone who is experienced with this to help you. Unfortunately, this issue is so general that we cannot provide any specific solutions as this type of error is very general. To troubleshoot and identify exactly which library or function causes this, you may run the R script that failed in debug mode and go through it line by line. See the next section for more details.
.. note:: This particular message may also be related to an incompatibility of the *DiffBind* and *DESeq2* libraries. See the changelog for details, as this has been addressed in version 1.1.5.
More generally, however, such messages point to a problem with your R and R libraries installation and have per se nothing to do with *diffTF*. In such cases, we advise to reinstall the latest version of *Bioconductor* and ask someone who is experienced with this to help you. Unfortunately, this issue is so general that we cannot provide any specific solutions. To troubleshoot and identify exactly which library or function causes this, you may run the R script that failed in debug mode and go through it line by line. See the next section for more details.
Fixing the error
......
......@@ -31,14 +31,21 @@ We also put the paper on *bioRxiv*, please read all methodological details here:
Change log
============================
Version 1.1.5 (2018-08-14)
- optimized ``checkParameterValidity.R`` script, only TFBS files for TFs included in the analysis are now checked
- addressed an R library compatibility issue independent of *diffTF* that users reported. In some cases, for particular versions of R and Bioconductor, R exited with a *segfault* (memory not mapped) error in the ``checkParameterValidity.R`` that seems to be caused by the combination of *DiffBind* and *DESeq2*. Specifically, when *DiffBind* is loaded *before* *DESeq2*, R crashes with a segmentation fault upon exiting, whereas loading *DiffBind* *after* *DESeq2* causes no issue. If there are further issues, please let us know. Thanks to Gyan Prakash Mishra, who first reported this.
- fixed an issue when the number of peaks is very small so that some TFs have no overlapping TFBS at all in the peak regions. This caused the rule ``intersectTFBSAndBAM`` to exit with an error due to grep's policy of returning exit code 1 if no matches are returned (thanks to Jonas Ungerbeck, again).
- removed the ``--timestamp`` option in the helper script ``startAnalysis.sh`` because this option has been removed for Snakemake >5.2.1
- Documentation updates
Version 1.1.4 (2018-08-09)
- minor, updated the checkParameterValidity.R script and the documentation (one package was not mentioned)
- minor, updated the ``checkParameterValidity.R`` script and the documentation (one package was not mentioned)
Version 1.1.3 (2018-08-06)
- minor, fixed a small issue in the Volcano plot (legends wrong and background color in the plot was not colored properly)
Version 1.1.2 (2018-08-03)
- fixed a bug that made the ``3.analyzeTF`` script fail in case when the number of permutations has been changed throughout the analysis or when the value is higher than the actual maximum number (thanks to Jonas Ungerbeck)
- fixed a bug that made the ``3.analyzeTF.R`` script fail in case when the number of permutations has been changed throughout the analysis or when the value is higher than the actual maximum number (thanks to Jonas Ungerbeck)
Version 1.1.1 (2018-08-01)
- Documentation updates (referenced the bioRxiv paper, extended the section about errors)
......
......@@ -8,7 +8,7 @@
"conditionComparison": "GMP,MPP",
"designContrast": "~ conditionSummary",
"designVariableTypes": "conditionSummary:factor",
"nPermutations": 70,
"nPermutations": 100,
"nBootstraps": 0,
"nCGBins": 10,
"TFs": "CTCF,CEBPB,SNAI2,CEBPA,UBIP1,CEBPG,CEBPD,ZFX,AP2D,PAX5.S,SNAI1,ZEB1,SP4,MBD2,IRF1,MECP2,PAX5.D,SP3,NFIA.C,SP1.A,IRF7,MYF6, NRF1,DBP,MAZ,NKX28,DLX2,GATA1,P53,ZN143,AIRE,NR2C2,HMGA1,FUBP1,TEAD3,OVOL1,HXD4,KLF1,RXRG,HNF1B,ZIC3,HNF1A,NANOG.S,GFI1,PO3F1,NR2C1,ELF5,TF65.C,NFAC3,TEAD1",
......
......@@ -17,4 +17,4 @@ echo "# INCREASING THE NUMBER OF CORES SPEEDS UP THE ANALYSIS #"
echo "#########################################################"
# Real run, using 2 cores
snakemake --snakefile ../../src/Snakefile --cores 2 --configfile config.json --timestamp
snakemake --snakefile ../../src/Snakefile --cores 2 --configfile config.json
......@@ -181,21 +181,29 @@ if (nrow(problems(overlapsAll.df)) > 0) {
stop("Error when parsing the file ", fileCur, ", see errors above")
}
colnames(overlapsAll.df) = c("annotation", "chr","MSS","MES", "strand","length", colnamesNew)
overlapsAll.df = overlapsAll.df %>%
dplyr::mutate(TFBSID = paste0(chr,":", MSS, "-",MES),
mean = apply(dplyr::select(overlapsAll.df, one_of(colnamesNew)), 1, mean),
peakID = sapply(strsplit(overlapsAll.df$annotation, split = "_", fixed = TRUE),"[[", 1)) %>%
dplyr::distinct(TFBSID, .keep_all = TRUE) %>%
dplyr::select(-one_of("length"))
if (nrow(overlapsAll.df) > 0) {
colnames(overlapsAll.df) = c("annotation", "chr","MSS","MES", "strand","length", colnamesNew)
overlapsAll.df = overlapsAll.df %>%
dplyr::mutate(TFBSID = paste0(chr,":", MSS, "-",MES),
mean = apply(dplyr::select(overlapsAll.df, one_of(colnamesNew)), 1, mean),
peakID = sapply(strsplit(overlapsAll.df$annotation, split = "_", fixed = TRUE),"[[", 1)) %>%
dplyr::distinct(TFBSID, .keep_all = TRUE) %>%
dplyr::select(-one_of("length"))
skipTF = FALSE
} else {
skipTF = TRUE
}
nTFBS = nrow(overlapsAll.df)
skipTF = FALSE
......
......@@ -282,6 +282,9 @@ for (fileCur in par.l$files_input_TF_allMotives) {
flog.info(paste0(" Found ", nrow(TF.motifs.all) - nrow(TF.motifs.all.unique), " duplicated TFBS across all TF."))
# TODO: Optimize as in dev TF.motifs.all.unique = TF.motifs.all.unique[which(TF.motifs.all.unique$TF != TFCur & is.finite(TF.motifs.all.unique$log2FoldChange)),]
nRowsTF = nrow(TF.motifs.all[which(TF.motifs.all$TF == TFCur),])
......
This diff is collapsed.
......@@ -471,7 +471,6 @@ rule intersectPeaksAndTFBS:
"""
rule intersectTFBSAndBAM:
input:
bed = rules.intersectPeaksAndTFBS.output.TFBSinPeaksMod_bed,
......@@ -482,7 +481,7 @@ rule intersectTFBSAndBAM:
BAMOverlap = TF_DIR + "/{TF}/" + extDir + "/" + compType + "{TF}.allBAMs.overlaps.bed.gz",
saf = temp(expand('{dir}/{compType}{{TF}}.allTFBS.peaks.extension.saf', dir = TEMP_EXTENSION_DIR, compType = compType))
log:
message: "{ruleDisplayMessage} Intersect file {input.bed} against all BAM files..."
message: "{ruleDisplayMessage} Intersect file {input.bed} against all BAM files for TF {wildcards.TF}..."
threads: 4
params:
pairedEnd = pairedEndOptions,
......@@ -491,20 +490,27 @@ rule intersectTFBSAndBAM:
ulimitMax = ulimitMax
shell:
""" ulimit -n {params.ulimitMax} &&
zgrep "{wildcards.TF}_TFBS\." {input.bed} | awk 'BEGIN {{ OFS = "\\t" }} {{print $4"_"$2"-"$3,$1,$2,$3,$6}}' | sort -u -k1,1 >{output.saf} &&
featureCounts \
-F SAF \
-T {threads} \
{params.readFiltering} \
{params.pairedEnd} \
-a {output.saf} \
-s 0 \
{params.multiOverlap} \
-o {output.BAMOverlapRaw} \
{input.allBAMs} &&
zgrep "{wildcards.TF}_TFBS\." {input.bed} | awk 'BEGIN {{ OFS = "\\t" }} {{print $4"_"$2"-"$3,$1,$2,$3,$6}}' | sort -u -k1,1 >{output.saf} || true &&
if [[ $(wc -l <{output.saf}) -eq "0" ]]; then
touch {output.BAMOverlapRaw}
echo "No TFBS found, skip featureCounts..."
else
featureCounts \
-F SAF \
-T {threads} \
{params.readFiltering} \
{params.pairedEnd} \
-a {output.saf} \
-s 0 \
{params.multiOverlap} \
-o {output.BAMOverlapRaw} \
{input.allBAMs}
fi &&
gzip -f < {output.BAMOverlapRaw} > {output.BAMOverlap}
"""
name_plots = PEAKS_DIR + "/" + compType + "diagnosticPlots.peaks.pdf"
rule DiffPeaks:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment