Peak-gene confounding factors
Summarize here the current understanding we have about how and why peak-gene correlations may be biased also in the background.
Issue: often random networks have still high signal (as shown in enrichment of low p-values when picking random peak-gene pairs to calculate the correlation).
Current decision: as long as the signal in p-values is higher for positive correlations than for negative ones in the real data, and also higher for positive correlations in real vs random, there is no need to be concerned. (e.g. for the AML data it looks good). However, if the signal is similarly high for negative and positive correlations we should be concerned and try to figure out why this is the case.
Ideas discussed for selecting threshold for peak-gene correlations:
- use the distribution of negative correlations (of the real network) as background for the positive correlations (e.g. by mirroring the distribution of negative correlations on zero).
- same but use the p-value distribution to calculate FDR fg=pvalues from positive correlations and bg=pvalues from negative correlations.