@@ -53,6 +53,14 @@ We also put the paper on *bioRxiv*, please read all methodological details here:
Change log
============================
Version 1.4 (2019-09-24)
- Various small improvements for increasing user experience
- For the peaks and TF-specific diagnostic plots, the pairwise mean-average plots between pairs of samples has been disabled by default. It was set to a maximum of 5 previously, but due to its time consuming nature and limited usage, we feel this is not needed for most users. We might add a parameter in the future to adjust this more flexibly, contact us if you want to have them back.
- various small improvements for the various diagnostic plots from the last step (summaryFinal), most of which concern classification-related plots and plots for the permutation-based approach
- For all output tables, diffTF now outputs only 2-3 significant digits, which reduces the size of some output tables significantly. THe memory footprint is thus overall decreased. More digits are not needed in our opinion.
- added diagnostic plots for the RNA-Seq data that is used for the classification. These include MA plots (based on original and shrunken log2 fold changes, (non)normalized log count density plots across all samples, and a mean-sd plot),
- added preliminary support for interaction terms in the design formula. This was not possible before but should now work without diffTF failing. if you continue having issues, please let us know.
Version 1.3.3 (2019-07-25)
- Fixed a bug that caused erroneously small p-values for the permutation-based approach in cases when the number of permutations that was actually done was smaller than what was specified in the configuration file (e.g., if 1000 permutations have been specified in the configuration file, but only 70 were actually done such as for a typical 4 vs 4 analysis). This is now corrected, and the last step of the pipeline (*summaryFinal*) should be rerun to correct for it. Although this happened in the special circumstances as described above (small number of samples and yet using the permutation-based approach), we apologize for this oversight. Thanks to Frauke Huth for noticing!
message=paste0("l2fc values could not calculated for any TFBS, this may happen for complex design formulas in combination with particular permutations. For permutation ",permutationCur,", all values are NA.")
message=paste0("File ",bamCur," does not have the correct chromosome names. The \"chr\" prefix is required for proper chromosome names, but they were not found. Check your BAM files and use samtools and sed to add \"chr\" to each chromosome name")
message=paste0("Design formula is incorrect, not all terms from the interactions (",paste0(componentsCur,collapse=","),") have been specified individually.")
flog.info(paste0(" Plotting MA plot from DESeq (1)..."))
# 1. Regular MA plot:
# shows the log2 fold changes attributable to a given variable over the mean of normalized counts for all the samples in the DESeqDataSet
DESeq2::plotMA(dd,main="Regular MA plot")
# 2. MA plot based on shrunken log2 fold changes
# Shrinkage of effect size (LFC estimates) is useful for visualization and ranking of genes. To shrink the LFC, we pass the dds object to the function lfcShrink. Below we specify to use the apeglm method for effect size shrinkage (Zhu, Ibrahim, and Love 2018), which improves on the previous estimator.
# It is more useful visualize the MA-plot for the shrunken log2 fold changes, which remove the noise associated with log2 fold changes from low count genes without requiring arbitrary filtering thresholds.
flog.info(paste0(" Plotting MA plot from DESeq (2)..."))
# To further assess systematic differences between the samples, we can also plot pairwise mean–average plots: We plot the average of the log–transformed counts vs the fold change per gene for each of the sample pairs.
MA.idx=t(combn(seq_len(dim(colData(dd))[1]),2))
if(nrow(MA.idx)>maxPairwiseComparisons){
flog.info(paste0("The number of pairwise comparisons to plot exceeds the current maximum of ",maxPairwiseComparisons,". Only ",maxPairwiseComparisons," pairwise comparisons will be shown in the PDF."))
flog.info(paste0(" Plotting pairwise sample comparisons. This amy take a while."))
# 3. Pairwise sample comparisons.
# To further assess systematic differences between the samples, we can also plot pairwise mean–average plots: We plot the average of the log–transformed counts vs the fold change per gene for each of the sample pairs.
MA.idx=t(combn(seq_len(dim(colData(dd))[1]),2))
if(nrow(MA.idx)>maxPairwiseComparisons){
flog.info(paste0("The number of pairwise comparisons to plot exceeds the current maximum of ",maxPairwiseComparisons,". Only ",maxPairwiseComparisons," pairwise comparisons will be shown in the PDF."))
message=paste0("All remaining pairwise comparisons plots\nbetween samples have been omitted\nfor time and memory reasons.\nThe current maximum is set to ",maxPairwiseComparisons,".")
text(x=0.5,y=0.5,message,cex=1.6,col="red")
}
# Show an empty page with a warning if plots have been omitted
message=paste0("All remaining pairwise comparisons plots\nbetween samples have been omitted\nfor time and memory reasons.\nThe current maximum is set to ",maxPairwiseComparisons,".")
text(x=0.5,y=0.5,message,cex=1.6,col="red")
}
# 4. Mean SD plot: Plot row standard deviations versus row means
notAllZeroPeaks<-(rowSums(DESeq2::counts(dd))>0)
...
...
@@ -691,7 +735,7 @@ heatmap.act.rep <- function(df.tf.peak.matrix, HOCOMOCO_mapping.df.exp, cor.m, p
message=paste0("Design formula is incorrect, not all terms from the interactions (",paste0(componentsCur,collapse=","),") have been specified individually.")