Commit 21fb4f25 authored by Christian Arnold's avatar Christian Arnold

Major update. Integrated old and new version into one

parent f3da2354
This diff is collapsed.
......@@ -85,7 +85,7 @@ Snakemake
Please ensure that you have at least version 4.3 installed. Principally, there are `multiple ways to install Snakemake <http://snakemake.readthedocs.io/en/stable/getting_started/installation.html>`_. We recommend installing it, along with all the other required software, via conda.
*samtools*, *bedtool*s, *Subread*
*samtools*, *bedtools*, *Subread*
----------------------------------
In addition, `samtools <http://www.htslib.org/download>`_, `bedtools <http://bedtools.readthedocs.io>`_ and `Subread <http://subread.sourceforge.net>`_ are needed to run *diffTF*. We recommend installing them, along with all the other required software, via conda.
......@@ -98,11 +98,10 @@ A working ``R`` installation is needed and a number of packages from either CRAN
.. code-block:: R
install.packages(c("checkmate", "futile.logger", "tidyverse", "reshape2", "gridExtra", "scales", "jsonlite", "RColorBrewer", "rlist", "ggrepel", "lsr", "modeest", "locfdr", "boot"))
install.packages(c("checkmate", "futile.logger", "tidyverse", "reshape2", "RColorBrewer", "ggrepel", "lsr", "modeest", "boot", "grDevices", "pheatmap", "matrixStats", "locfdr"))
source("https://bioconductor.org/biocLite.R")
biocLite(c("limma", "vsn", "csaw", "DESeq2", "EDASeq", "DiffBind", "geneplotter", "Rsamtools"))
biocLite(c("limma", "vsn", "csaw", "DESeq2", "DiffBind", "geneplotter", "Rsamtools"))
.. _docs-runOwnAnalysis:
......@@ -115,4 +114,4 @@ Running your own analysis is almost as easy as running the example analysis. Car
2. Modify the file ``config.json`` accordingly. For example, we strongly recommend running the analysis for all TF instead of just 50 as for the example analysis. For this, simply change the parameter “TFs” to “all”. See Section :ref:`configurationFile` for details about the meaning of the parameters. Do not delete or rename any parameters or sections.
3. Create a tab-separated file that defines the input data, in analogy to the file ``sampleData.tsv`` from the example analysis, and refer to that in the file ``config.json`` (parameter ``summaryFile``)
4. Adapt the file ``startAnalysis.sh`` if necessary (the exact command line call to Snakemake and the various Snakemake-related parameters)
5. Since running the pipeline might be computationally demanding, read Section :ref:`timeMemoryRequirements` and decide on which machine to run the pipeline. In most cases, we recommend running *diffTF* in a cluster environment. The pipeline is written in Snakemake, and we strongly suggest to also read Section :ref:`workingWithPipeline` to get a basic understanding of how the pipeline works.
5. Since running the pipeline is often computationally demanding, read Section :ref:`timeMemoryRequirements` and decide on which machine to run the pipeline. In most cases, we recommend running *diffTF* in a cluster environment (see Section :ref:`clusterEnvironment` for details). The pipeline is written in Snakemake, and we strongly suggest to also read Section :ref:`workingWithPipeline` to get a basic understanding of how the pipeline works.
This diff is collapsed.
......@@ -2,7 +2,10 @@
Biological motivation
============================
Transcription factor (TF) activity constitutes an important readout of cellular signalling pathways and thus for assessing regulatory differences across conditions. However, current technologies lack the ability to simultaneously assessing activity changes for multiple TFs and surprisingly little is known about whether a TF acts as repressor or activator. To this end, we introduce the widely applicable genome-wide method diffTF to assess differential TF binding activity and classifying TFs as activator or repressor by integrating any type of genome-wide chromatin with RNA-Seq data and in-silico predicted TF binding sites
Transcription factor (TF) activity constitutes an important readout of cellular signalling pathways and thus for assessing regulatory differences across conditions. However, current technologies lack the ability to simultaneously assessing activity changes for multiple TFs and surprisingly little is known about whether a TF acts as repressor or activator. To this end, we introduce the widely applicable genome-wide method diffTF to assess differential TF binding activity and classifying TFs as activator or repressor by integrating any type of genome-wide chromatin with RNA-Seq data and in-silico predicted TF binding sites.
For a graphical summary of the idea, see the section :ref:`workflow`
Help, contribute and contact
============================
......@@ -16,7 +19,7 @@ Citation
If you use this software, please cite the following reference:
Ivan Berest*, Christian Arnold*, Armando Reyes-Palomares, Kasper Dindler Rassmussen, Kristian Helin & Judith B. Zaugg. *Genome-wide quantification of differential transcription factor activity: diffTF*. 2017. submitted.
Ivan Berest*, Christian Arnold*, Armando Reyes-Palomares, Giovanni Palla, Kasper Dindler Rassmussen, Kristian Helin & Judith B. Zaugg. *Genome-wide quantification of differential transcription factor activity: diffTF*. 2018. submitted.
Change log
......
{
"par_general":
{
"outdir" : "../output",
"regionExtension" : 100,
"comparisonType" : "GMPvsMPP",
"designContrast" : "~ conditionSummary",
"designVariableTypes" : "conditionSummary:factor",
"nPermutations" : 3,
"nBootstraps" : 0,
"nCGBins" : 10,
"TFs" : "CTCF,CEBPB,SNAI2,CEBPA,UBIP1,CEBPG,CEBPD,ZFX,AP2D,PAX5.S,SNAI1,ZEB1,SP4,MBD2,IRF1,MECP2,PAX5.D,SP3,NFIA.C,SP1.A,IRF7,MYF6, NRF1,DBP,MAZ,NKX28,DLX2,GATA1,P53,ZN143,AIRE,NR2C2,HMGA1,FUBP1,TEAD3,OVOL1,HXD4,KLF1,RXRG,HNF1B,ZIC3,HNF1A,NANOG.S,GFI1,PO3F1,NR2C1,ELF5,TF65.C,NFAC3,TEAD1",
"dir_scripts" : "../../src/R",
"RNASeqIntegration" : true
},
"samples":
{
"summaryFile" : "sampleData.tsv"
"par_general": {
"outdir": "../output",
"regionExtension": 100,
"comparisonType": "GMPvsMPP.all",
"conditionComparison": "GMP,MPP",
"designContrast": "~ conditionSummary",
"designVariableTypes": "conditionSummary:factor",
"nPermutations": 5,
"nBootstraps": 0,
"nCGBins": 10,
"TFs": "CTCF,CEBPB,SNAI2,CEBPA,UBIP1,CEBPG,CEBPD,ZFX,AP2D,PAX5.S,SNAI1,ZEB1,SP4,MBD2,IRF1,MECP2,PAX5.D,SP3,NFIA.C,SP1.A,IRF7,MYF6, NRF1,DBP,MAZ,NKX28,DLX2,GATA1,P53,ZN143,AIRE,NR2C2,HMGA1,FUBP1,TEAD3,OVOL1,HXD4,KLF1,RXRG,HNF1B,ZIC3,HNF1A,NANOG.S,GFI1,PO3F1,NR2C1,ELF5,TF65.C,NFAC3,TEAD1",
"TFs_disabled": "all",
"dir_scripts": "../../src/R",
"RNASeqIntegration": true
},
"peaks":
{
"consensusPeaks" : "",
"peakType" : "narrow",
"minOverlap" : 2
},
"additionalInputFiles":
{
"refGenome_fasta" : "referenceGenome/mm10.fa",
"dir_TFBS" : "mm10/PWMScan_HOCOMOCOv10",
"RNASeqCounts" : "data/RNA-Seq/RNA.counts.tsv",
"HOCOMOCO_mapping" : "../../src/TF_Gene_TranslationTables/HOCOMOCO_v10/translationTable_mm10.csv"
"samples": {
"summaryFile": "sampleData.tsv"
},
"peaks": {
"consensusPeaks": "",
"peakType": "narrow",
"minOverlap": 2
},
"additionalInputFiles": {
"refGenome_fasta": "referenceGenome/mm10.fa",
"dir_TFBS": "mm10/PWMScan_HOCOMOCOv10",
"RNASeqCounts": "data/RNA-Seq/RNA.counts.tsv",
"HOCOMOCO_mapping": "../../src/TF_Gene_TranslationTables/HOCOMOCO_v10/translationTable_mm10.csv"
}
}
......@@ -17,4 +17,4 @@ echo "# INCREASING THE NUMBER OF CORES SPEEDS UP THE ANALYSIS #"
echo "#########################################################"
# Real run, using 2 cores
snakemake --snakefile ../../src/Snakefile --cores 2 --configfile config.json --timestamp --directory .
snakemake --snakefile ../../src/Snakefile --cores 2 --configfile config.json --timestamp
......@@ -10,5 +10,6 @@ echo "####################################################################"
echo "\nThis wrapper script calls Snakemake in dryrun mode to start diffTF. Nothing will actually be executed.\n"
# Dryrun, using 2 cores
snakemake --snakefile ../../src/Snakefile --dryrun --cores 2 --configfile config.json
snakemake --snakefile ../../src/Snakefile --dryrun --quiet --cores 2 --configfile config.json
par.l$designFormula = "~Treatment + conditionSummary"
par.l =list()
library(checkmate)
par.l$designFormula = "~Treatment + conditionSummary"
as.formula(par.l$designFormula)
formulaDesign = as.formula(par.l$designFormula)
as.formula("blaaaa")
as.formula("bla")
as.formula("~bla")
designFormula = tryCatch({
as.formula(par.l$designFormula)
}, warning = function(w) {
stop("Converting the design formula ", par.l$designFormula, " created a warning, which should be checked carefully.")
}, error = function(e) {
stop("Design formula ", par.l$designFormula, " not valid")
}
designFormula
designFormula = tryCatch({
as.formula(par.l$designFormula)
}, warning = function(w) {
stop("Converting the design formula ", par.l$designFormula, " created a warning, which should be checked carefully.")
}, error = function(e) {
stop("Design formula ", par.l$designFormula, " not valid")
}
par.l$designFormula = "~Treatment + conditionSummary"
)
designFormula = tryCatch({
as.formula(par.l$designFormula)
}, warning = function(w) {
stop("Converting the design formula ", par.l$designFormula, " created a warning, which should be checked carefully.")
}, error = function(e) {
stop("Design formula ", par.l$designFormula, " not valid")
})
designFormula
par.l$designFormula = "~Treatment design"
designFormula = tryCatch ({
as.formula(par.l$designFormula)
}, warning = function(w) {
stop("Converting the design formula ", par.l$designFormula, " created a warning, which should be checked carefully.")
}, error = function(e) {
stop("Design formula ", par.l$designFormula, " not valid")
})
designFormula = tryCatch ({
as.formula(par.l$designFormula)
}, warning = function(w) {
stop("Converting the design formula \", par.l$designFormula, "\" created a warning, which should be checked carefully.")
}, error = function(e) {
stop("Design formula \", par.l$designFormula, "\" not valid")
})
designFormula
str(designFormula)
?formula
terms(formula)
terms(designFormula)
str(terms(designFormula))
attr(terms(designFormula), "term.labels")
??removeBatchEffect
library(limma)
install.packages(limma)
install.packages("limma")
source("https://bioconductor.org/biocLite.R")
biocLite("limma")
librar(limma)
library(limma)
??removeBatchEffect
library(matrixStats)
??matrixStats
?matrixStats
......@@ -45,6 +45,8 @@ assertList(snakemake@config, min.len = 1)
file_peaks = snakemake@config$peaks$consensusPeaks
conditionComparison = strsplit(snakemake@config$par_general$conditionComparison, ",")[[1]]
assertVector(conditionComparison, len = 2)
TFBS_dir = snakemake@config$additionalInputFiles$dir_TFBS
assertDirectoryExists(dirname(TFBS_dir), access = "r")
......@@ -53,6 +55,13 @@ fastaFile = snakemake@config$additionalInputFiles$refGenome_fasta
assertFileExists(fastaFile)
assertDirectoryExists(dirname(fastaFile), access = "w")
par.l$nPermutations = snakemake@config$par_general$nPermutations
assertIntegerish(par.l$nPermutations, lower = 0, len = 1)
par.l$nBootstraps = as.integer(snakemake@config$par_general$nBootstraps)
assertIntegerish(par.l$nBootstraps, len = 1)
par.l$file_input_sampleData = snakemake@config$samples$summaryFile
checkAndLogWarningsAndErrors(par.l$file_input_sampleData, checkFileExists(par.l$file_input_sampleData, access = "r"))
......@@ -81,7 +90,7 @@ printParametersLog(par.l)
checkAndLoadPackages(c("tidyverse", "futile.logger", "DiffBind", "checkmate", "stats"), verbose = FALSE)
# Step 2
checkAndLoadPackages(c("tidyverse", "futile.logger", "DESeq2", "vsn", "csaw", "checkmate", "limma", "tools", "EDASeq", "geneplotter", "RColorBrewer", "BiocParallel", "rlist"), verbose = FALSE)
checkAndLoadPackages(c("tidyverse", "futile.logger", "DESeq2", "vsn", "csaw", "checkmate", "limma", "tools", "geneplotter", "RColorBrewer"), verbose = FALSE)
# Step 3
checkAndLoadPackages(c("tidyverse", "futile.logger", "DESeq2", "vsn", "modeest", "checkmate", "limma", "geneplotter", "RColorBrewer", "tools"), verbose = FALSE)
......@@ -90,11 +99,24 @@ checkAndLoadPackages(c("tidyverse", "futile.logger", "DESeq2", "vsn", "modeest",
checkAndLoadPackages(c("tidyverse", "futile.logger", "modeest", "checkmate", "ggrepel"), verbose = FALSE)
# Step 5
checkAndLoadPackages(c("tidyverse", "futile.logger", "checkmate", "tools", "methods"), verbose = FALSE)
checkAndLoadPackages(c("tidyverse", "futile.logger", "checkmate", "tools", "methods", "boot"), verbose = FALSE)
# Step 6
checkAndLoadPackages(c("tidyverse", "futile.logger", "lsr", "ggrepel", "checkmate", "tools", "methods", "boot"), verbose = TRUE)
checkAndLoadPackages(c("tidyverse", "futile.logger", "lsr", "ggrepel", "checkmate", "tools", "methods", "grDevices", "pheatmap"), verbose = TRUE)
# Check the version of readr, at least 1.1.0 is required to properly write gz files
if (packageVersion("readr") < "1.1.0") {
message = paste0("Version of readr library is too old, at least version 1.1.0 is required). Execute the following in R: install.packages('readr') ")
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
}
if (par.l$nPermutations == 0 && par.l$nBootstraps < 1000) {
flog.warn(paste0("The value for nBootstraps is < 1000. We strongly recommend using a higher value in order to reliably estimate the statistical variance."))
}
#############################
# CHECK FASTA AND BAM FILES #
......@@ -102,6 +124,18 @@ checkAndLoadPackages(c("tidyverse", "futile.logger", "lsr", "ggrepel", "checkmat
sampleData.df = read_tsv(par.l$file_input_sampleData, col_names = TRUE, col_types = cols())
# Check the sample table
nDistValues = length(unique(sampleData.df$conditionSummary))
if (nDistValues != 2) {
message = paste0("The column 'conditionSummary' must contain exactly 2 different values, but ", nDistValues, " were found.")
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
}
if (!testSubset(unique(sampleData.df$conditionSummary), conditionComparison)) {
message = paste0("The elements specified in 'conditionComparison' in the config file must be a subset of the values in the column 'conditionSummary' in the sample file")
checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE)
}
# Build the fasta index. Requires write access to the folder where the fasta is stored (limitation of samtools faidx)
fastaIndex = paste0(fastaFile, ".fai")
......@@ -147,6 +181,8 @@ for (bamCur in sampleData.df$bamReads) {
}
flog.info(paste0("Check peak files..."))
###################
# CHECK PEAK FILE #
###################
......@@ -219,6 +255,8 @@ if (file_peaks != "") {
}
flog.info(paste0("Check TF-specific TFBS files..."))
TFs = createFileList(TFBS_dir, "_TFBS.bed", verbose = FALSE)
if (length(TFs) == 0) {
......
......@@ -41,7 +41,7 @@ assertClass(snakemake, "Snakemake")
## INPUT ##
assertList(snakemake@input, min.len = 1)
assertSubset(names(snakemake@input), c("", "peaks", "checkFlag"))
assertSubset(names(snakemake@input), c("", "peaks", "checkFlag", "sampleFile"))
par.l$file_input_peaks = snakemake@input$peaks
......@@ -127,6 +127,7 @@ consensusPeaks.df$annotation = paste0(consensusPeaks.df$chr, ":", consensusPeaks
# Reorganize columns
consensusPeaks.df = consensusPeaks.df[,c("chr", "start", "end", "annotation", "pvalue")]
consensusPeaks.df = mutate_if(consensusPeaks.df, is.numeric, as.character)
write_tsv(consensusPeaks.df, path = par.l$file_output_consensusPeaks, col_names = FALSE)
.printExecutionTime(start.time)
......
......@@ -99,8 +99,6 @@ for (fileCur in par.l$files_input_TF_summary) {
peaks.df = read_tsv(par.l$file_input_peaks, col_types = cols())
assertSubset(colnames(peaks.df), c("permutation", "position", "D2_baseMean", "D2_l2FC", "D2_ldcSE", "D2_stat", "D2_pval", "D2_padj"))
summary.df = NULL
nTFs = length(par.l$filesTransl.l)
......@@ -149,18 +147,18 @@ if (nTFMissing == nrow(summary.df)) {
summary.df$pvalue_raw[summary.df$pvalue_raw == 0] = .Machine$double.xmin
mode_peaks = mlv(peaks.df$D2_l2FC, method = "mfv", na.rm = TRUE)
mode_peaks = mlv(peaks.df$l2FC, method = "mfv", na.rm = TRUE)
summary.df = summary.df %>%
dplyr::mutate(
adj_pvalue = p.adjust(pvalue_raw, method = "fdr"),
Diff_mean = Mean_l2FC - mean(peaks.df$D2_l2FC, na.rm = TRUE),
DiffMedian = Median_l2FC - median(peaks.df$D2_l2FC, na.rm = TRUE),
Diff_mode = Mode_l2FC - mode_peaks[[1]],
Diff_skew = skewness_l2FC - mode_peaks[[2]]) %>%
adj_pvalue = p.adjust(pvalue_raw, method = "fdr"),
Diff_mean = Mean_l2FC - mean(peaks.df$l2FC, na.rm = TRUE),
Diff_median = Median_l2FC - median(peaks.df$l2FC, na.rm = TRUE),
Diff_mode = Mode_l2FC - mode_peaks[[1]],
Diff_skew = skewness_l2FC - mode_peaks[[2]]) %>%
na.omit(summary.df)
summary.df = mutate_if(summary.df, is.numeric, as.character)
write_tsv(summary.df, par.l$file_output_table) # TODO: check the dec = "." parameter
.printExecutionTime(start.time)
......
This diff is collapsed.
## README: prepare Permutations, to make it more efficient
start.time <- Sys.time()
#########################
# LIBRARY AND FUNCTIONS #
#########################
library("checkmate")
assertClass(snakemake, "Snakemake")
assertDirectoryExists(snakemake@config$par_general$dir_scripts)
source(paste0(snakemake@config$par_general$dir_scripts, "/functions.R"))
initFunctionsScript(packagesReq = NULL, minRVersion = "3.1.0", warningsLevel = 1, disableScientificNotation = TRUE)
checkAndLoadPackages(c("tidyverse", "futile.logger", "checkmate", "tools", "methods", "BiocParallel", "rlist"), verbose = FALSE)
# methods needed here because Rscript does not loads this package automatically, see http://stackoverflow.com/questions/19468506/rscript-could-not-find-function
########################################################################
# SAVE SNAKEMAKE S4 OBJECT THAT IS PASSED ALONG FOR DEBUGGING PURPOSES #
########################################################################
# Use the following line to load the Snakemake object to manually rerun this script (e.g., for debugging purposes)
# Replace {outputFolder} correspondingly.
# snakemake = readRDS("{outputFolder}/LOGS_AND_BENCHMARKS/5.prepareBinning.R.rds")
createDebugFile(snakemake)
###################
#### PARAMETERS ###
###################
par.l = list()
par.l$verbose = TRUE
par.l$log_minlevel = "INFO"
#####################
# VERIFY PARAMETERS #
#####################
assertClass(snakemake, "Snakemake")
## INPUT ##
assertList(snakemake@input, min.len = 1)
assertSubset(names(snakemake@input), c("", "nucContent", "motifes"))
par.l$files_input_nucContentGenome = snakemake@input$nucContent
for (fileCur in par.l$files_input_nucContentGenome) {
assertFileExists(fileCur , access = "r")
}
par.l$file_input_TF_allMotives = snakemake@input$motifes
assertFileExists(par.l$file_input_TF_allMotives, access = "r")
## OUTPUT ##
assertList(snakemake@output, min.len = 1)
assertSubset(names(snakemake@output), c("", "allTFData", "allTFUniqueData"))
par.l$file_output_allTF = snakemake@output$allTFData
par.l$file_output_allTFUnique = snakemake@output$allTFUniqueData
## CONFIG ##
assertList(snakemake@config, min.len = 1)
par.l$nPermutations = snakemake@config$par_general$nPermutations
assertIntegerish(par.l$nPermutations, lower = 0)
## PARAMS ##
assertList(snakemake@params, min.len = 1)
assertSubset(names(snakemake@params), c("", "nCGBins"))
par.l$nBins = as.integer(snakemake@params$nCGBins)
assertIntegerish(par.l$nBins, lower = 1, upper = 100)
## LOG ##
assertList(snakemake@log, min.len = 1)
par.l$file_log = snakemake@log[[1]]
allDirs = c(dirname(par.l$file_output_allTF),
dirname(par.l$file_output_allTFUnique),
dirname(par.l$file_log)
)
testExistanceAndCreateDirectoriesRecursively(allDirs)
######################
# FINAL PREPARATIONS #
######################
startLogger(par.l$file_log, par.l$log_minlevel, removeOldLog = TRUE)
printParametersLog(par.l)
#########################
# READ ALL MOTIVES FILE #
#########################
# import original dataframe with motifs. Has to be read only once
TF.motifs.ori = read_tsv(par.l$file_input_TF_allMotives, col_names = TRUE, col_types = cols())
if (nrow(TF.motifs.ori) == 0) {
message = paste0("The file ", par.l$file_input_TF_allMotives, " is empty. Something went wrong before. Make sure the previous steps succeeded.")
flog.fatal(message)
stop(message)
}
# TODO: eventuell löschen: chr, MSS, MES, PSS, PES
#Filter permutations in the original files that the user does not want anymore
TF.motifs.ori = filter(TF.motifs.ori, permutation <= par.l$nPermutations)
TF.motifs.ori$CG.identifier = paste0(TF.motifs.ori$TF,":",TF.motifs.ori$TFBSID)
#########################
# READ ALL NUC CG FILES #
#########################
readFile <- function(permutationCur, par.l) {
fileCur = par.l$files_input_nucContentGenome[permutationCur + 1]
if (permutationCur > 0) {
flog.info(paste0("Reading and processing file ", fileCur, " from permutation ", permutationCur))
} else {
flog.info(paste0("Reading and processing file ", fileCur, " from original data ", permutationCur))
}
TF.motifs.CG.cur = read_tsv(fileCur, col_names = TRUE,
col_types = list(
col_skip(), # "chr"
col_skip(), # "MSS"
col_skip(), # "MES"
col_character(), # "strand"
col_character(), # "TF",
col_skip(), # "AT"
col_number(), # "CG"
col_skip(), # "A"
col_skip(), # "C"
col_skip(), # "G"
col_skip(), # "T"
col_skip(), # "N"
col_skip(), # "other_nucl"
col_skip() # "length"
)
)
if (nrow(problems(TF.motifs.CG.cur)) > 0) {
flog.fatal(paste0("Parsing errors: "), problems(TF.motifs.CG.cur), capture = TRUE)
stop("Error when parsing the file ", fileCur, ", see errors above")
}
colnames(TF.motifs.CG.cur) = c("TFBSID","TF","CG")
nRowsNA = length(which(is.na(TF.motifs.CG.cur$CG)))
if (nRowsNA > 0) {
message <- paste0("The file ", fileCur, " contains ", nRowsNA, " rows out of ", nrow(TF.motifs.CG.cur), " with a missing value for the CG content, which most likely results from an assembly discordance between the BAM files and the specified fasta file. These regions will be removed in subsequent steps. The first 10 are printed with their first 3 columns here for debugging purposes:")
message = paste0(message, paste0(unlist(TF.motifs.CG.cur[1:10,"chr"]), ":", unlist(TF.motifs.CG.cur[1:10,"MSS"]), "-", unlist(TF.motifs.CG.cur[1:10,"MES"]), collapse = ", "))
checkAndLogWarningsAndErrors(NULL, message, isWarning = TRUE)
TF.motifs.CG.cur = TF.motifs.CG.cur[-nRowsNA,]
}
TF.motifs.CG.cur = TF.motifs.CG.cur %>%
mutate(CG.identifier = paste0(TF,":" ,TFBSID), permutation = permutationCur) %>%
select(-one_of("TFBSID", "TF"))
return(TF.motifs.CG.cur)
}
nCores = snakemake@threads
results.l = .execInParallelGen(nCores,
returnAsList = TRUE, listNames = paste0("permutation", 0:par.l$nPermutations),
iteration = 0:par.l$nPermutations,
abortIfErrorParallel = TRUE,
verbose = TRUE,
readFile, par.l)
TF.motifs.CG = do.call("rbind", results.l)
#########
# MERGE #
#########
CGBins = seq(0,1, 1/par.l$nBins)
TF.motifs.all = TF.motifs.ori %>%
full_join(TF.motifs.CG, by = c("CG.identifier", "permutation")) %>%
select(-one_of("CG.identifier")) %>%
mutate(CG.bins = cut(CG, breaks = CGBins, labels = paste0(round(CGBins[-1] * 100,0),"%"), include.lowest = TRUE))
# Not needed anymore, delete
rm(TF.motifs.CG)
rm(TF.motifs.ori)
# remove duplicated TFBS from different TFs to use in the permuations
TF.motifs.all.unique = TF.motifs.all[!duplicated(TF.motifs.all[,c("permutation", "TFBSID")]),]
# TODO: TFBSID and peakID needed?
flog.info(paste0("Found ", nrow(TF.motifs.all) - nrow(TF.motifs.all.unique), " duplicated TFBS across all TF."))
saveRDS(TF.motifs.all, file = par.l$file_output_allTF)
saveRDS(TF.motifs.all.unique, file = par.l$file_output_allTFUnique)
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
......@@ -421,101 +421,34 @@ myMAPlot <- function(M, idx, main, minMean = 0) {
}
printMultipleGraphsPerPage <- function(plots.l, nCol = 1, nRow = 1, pdfFile = NULL, height = NULL, width = NULL, verbose = FALSE) {
plotDiagnosticPlots <- function(dd, differentialResults, conditionComparison, filename = NULL, maxPairwiseComparisons = 5, alpha = 0.05) {
checkAndLoadPackages(c("grDevices", "gridExtra"), verbose = verbose)
assertList(plots.l, min.len = 1)
assertInt(nCol, lower = 1)
assertInt(nRow, lower = 1)
assert(checkNull(pdfFile), checkCharacter(pdfFile))
if (!testNull(pdfFile)) assertDirectory(dirname(pdfFile), access = "r")
assert(checkNull(height), checkNumber(height, lower = 1))
assert(checkNull(width), checkNumber(width, lower = 1))
if (testNull(height)) {
height = 7
}
if (testNull(width)) {
width = 7
}
for (i in seq_len(length(plots.l))) {
assertClass(plots.l[[i]], classes = c("ggplot", "gg"))
}
nPlotsPerPage = nCol * nRow
plotsNew.l = list()
if (!testNull(pdfFile)) {
clearOpenDevices()
pdf(pdfFile, height = height, width = width)
}
index = 0