── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.0 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.1 ✔ tibble 3.2.0
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
Attaching package: 'magrittr'
The following object is masked from 'package:purrr':
set_names
The following object is masked from 'package:tidyr':
extract
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:lubridate':
intersect, setdiff, union
The following objects are masked from 'package:dplyr':
combine, intersect, setdiff, union
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
anyDuplicated, aperm, append, as.data.frame, basename, cbind,
colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
table, tapply, union, unique, unsplit, which.max, which.min
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:lubridate':
second, second<-
The following objects are masked from 'package:dplyr':
first, rename
The following object is masked from 'package:tidyr':
expand
The following objects are masked from 'package:base':
expand.grid, I, unname
Loading required package: IRanges
Attaching package: 'IRanges'
The following object is masked from 'package:lubridate':
%within%
The following objects are masked from 'package:dplyr':
collapse, desc, slice
The following object is masked from 'package:purrr':
reduce
Loading required package: GenomeInfoDb
Attaching package: 'GenomicRanges'
The following object is masked from 'package:magrittr':
subtract
Loading required package: grid
Loading required package: Biostrings
Loading required package: XVector
Attaching package: 'XVector'
The following object is masked from 'package:purrr':
compact
Attaching package: 'Biostrings'
The following object is masked from 'package:grid':
pattern
The following object is masked from 'package:base':
strsplit
Attaching package: 'gridExtra'
The following object is masked from 'package:BiocGenerics':
combine
The following object is masked from 'package:dplyr':
combine
Attaching package: 'data.table'
The following object is masked from 'package:GenomicRanges':
shift
The following object is masked from 'package:IRanges':
shift
The following objects are masked from 'package:S4Vectors':
first, second
The following objects are masked from 'package:lubridate':
hour, isoweek, mday, minute, month, quarter, second, wday, week,
yday, year
The following objects are masked from 'package:dplyr':
between, first, last
The following object is masked from 'package:purrr':
transpose
Registered S3 method overwritten by 'gplots':
method from
reorder.factor gdata
ChIPseeker v1.34.1 For help: https://guangchuangyu.github.io/software/ChIPseeker
If you use ChIPseeker in published research, please cite:
Qianwen Wang, Ming Li, Tianzhi Wu, Li Zhan, Lin Li, Meijun Chen, Wenqin Xie, Zijing Xie, Erqiang Hu, Shuangbin Xu, Guangchuang Yu. Exploring epigenomic datasets by ChIPseeker. Current Protocols 2022, 2(10): e585
Loading required package: graph
Attaching package: 'graph'
The following object is masked from 'package:Biostrings':
complement
The following object is masked from 'package:stringr':
boundary
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: GO.db
Loading required package: AnnotationDbi
Attaching package: 'AnnotationDbi'
The following object is masked from 'package:dplyr':
select
Loading required package: SparseM
Attaching package: 'SparseM'
The following object is masked from 'package:base':
backsolve
groupGOTerms: GOBPTerm, GOMFTerm, GOCCTerm environments built.
Attaching package: 'topGO'
The following object is masked from 'package:grid':
depth
The following object is masked from 'package:IRanges':
members
Loading required package: GenomicFeatures
Attaching package: 'GenomicFeatures'
The following object is masked from 'package:topGO':
genes
config = load_config()
# load CHT results
cht_full = lapply(ab_tp_list, function(ab_tp) load_cht_results(ab_tp, remove_chr = F)) %>% bind_rows()
cht = cht_full %>% filter(!TEST.SNP.CHROM %in% c("chrX", "chrY", "chrM"))
cht_sign = cht %>% filter(signif_strongAI)
# genes and promoters
genes = load_genes()
promoters = resize(genes, width = 1000, fix = "start")
# combined motif set (all TFs, peaks + alleles)
fimo = get_full_motif_sets(cht, ab_tp_list)
# only alleles
fimo_alleles = lapply(ab_tp_list, function(ab_tp) parse_motifs_in_two_alleles(ab_tp, cht)) %>% bind_rows()
[1] "399_399_1"
[1] "399_399_2"
[1] "vgn_28_1"
[1] "vgn_28_2"
[1] "vgn_307_1"
[1] "vgn_307_2"
[1] "vgn_399_1"
[1] "vgn_399_2"
[1] "vgn_57_1"
[1] "vgn_57_2"
[1] "vgn_639_1"
[1] "vgn_639_2"
[1] "vgn_712_1"
[1] "vgn_712_2"
[1] "vgn_714_1"
[1] "vgn_714_2"
[1] "vgn_852_1"
[1] "vgn_852_2"
[1] "vgn_vgn_1"
[1] "vgn_vgn_2"
#vid = "chr3R_9194649"
vid = "chr3R_9196035"
l = plot_ai_and_read_depth_for_variant(cht %>% filter(condition == "ctcf/68"), ll_ctcf, vid)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
generated.
outf = file.path(outdir_fig_main, paste0("Fig2A_example_ctcf_tot_counts.pdf"))
ggsave(outf, l[[1]], width = 3, height = 3)
outf = file.path(outdir_fig_main, paste0("Fig2A_example_ctcf_allele_ratios.pdf"))
ggsave(outf, l[[2]], width = 3, height = 3)
tmp = l[[3]]
tmp = cht %>% filter(peak_id == "chr3R_9196039_ctcf/68") %>% mutate(dist = TEST.SNP.POS - peak_summit)
p = ggplot(tmp, aes(x = dist, y = AI_abs, color = signif_strongAI)) + geom_point(size = 1) +
theme_bw() +
scale_color_manual(values = c("darkgrey", cbPalette[2])) +
xlab("Distance to peak summit") +
ylab("Allele imbalance")
p
[1] "399_399_1"
[1] "399_399_2"
[1] "vgn_28_1"
[1] "vgn_28_2"
[1] "vgn_307_1"
[1] "vgn_307_2"
[1] "vgn_399_1"
[1] "vgn_399_2"
[1] "vgn_57_1"
[1] "vgn_57_2"
[1] "vgn_639_1"
[1] "vgn_639_2"
[1] "vgn_712_1"
[1] "vgn_712_2"
[1] "vgn_714_1"
[1] "vgn_714_2"
[1] "vgn_852_1"
[1] "vgn_852_2"
[1] "vgn_vgn_1"
[1] "vgn_vgn_2"
vid = "chr3R_26862967"
l = plot_ai_and_read_depth_for_variant(cht %>% filter(condition == "bin/68"), ll_bin, vid)
outf = file.path(outdir_fig_main, paste0("Fig2B_example_bin68_tot_counts.pdf"))
ggsave(outf, l[[1]], width = 3, height = 3)
outf = file.path(outdir_fig_main, paste0("Fig2B_example_bin68_allele_ratios.pdf"))
ggsave(outf, l[[2]], width = 3, height = 3)
l[[3]] %>% select(sample_id, TEST.SNP.HAPLOTYPE) %>% arrange(TEST.SNP.HAPLOTYPE)
# peak - chr3R_26863200_bin/68
tmp = cht %>% filter(peak_id == "chr3R_26863200_bin/68") %>% mutate(dist = TEST.SNP.POS - peak_summit)
p = ggplot(tmp, aes(x = dist, y = AI_abs, color = signif_strongAI)) + geom_point(size = 1) +
theme_bw() +
scale_color_manual(values = c("darkgrey", cbPalette[2])) +
xlab("Distance to peak summit") +
ylab("Allele imbalance")
p
outf = file.path(outdir_fig_main, paste0("Fig2B_example_bin_all_variants_for_peak.pdf"))
ggsave(outf, p, width = 4, height = 2)
Get variants completely linked with chr3R_26862967 (peak chr3R_26863200_bin/68)
variants = cht %>% filter(condition == "bin/68" & peak_id == "chr3R_26863200_bin/68" & signif_strongAI) %>% select(snp_id) %>% unlist()
lines_var = lapply(ll_bin, function(l) l %>% mutate(snp_id = paste(CHROM, TEST.SNP.POS, sep = "_")) %>% filter(snp_id %in% variants)) %>% bind_rows()
lines_var %>% select(snp_id, sample_id, TEST.SNP.GENOTYPE) %>% unique() %>% spread(sample_id, TEST.SNP.GENOTYPE)
min.p = 1e-20
for(ab_tp in ab_tp_list) {
print(ab_tp)
filenames = c("cht_results.txt", "cht_results_as.txt", "cht_results_bnb.txt", "cht_results_permuted.txt")
data_labels = c("CHT full", "CHT AS", "CHT BNB", "CHT permuted")
names(data_labels) = filenames
res_qq = lapply(filenames, function(f) {cht = load_cht_results(ab_tp, file_name = f);
lab = data_labels[f];
N = cht %>% filter(signif_strongAI) %>% select(snp_id) %>% unique() %>% nrow()
lab = paste(lab, paste0("N=", N), sep = "\n")
cat(lab)
get_qq_values(cht, lab)
}) %>% bind_rows()
res_qq$label = factor(res_qq$label, levels = unique(res_qq$label))
p = ggplot(res_qq, aes(x, y, color = label)) +
geom_point_rast(size = 0.2) +
geom_abline(intercept = 0, slope = 1) +
xlab("Null -log10 p-values") +
ylab("Observed -log10 p-values") +
theme_bw() +
guides(colour = guide_legend(override.aes = list(size=6))) +
scale_color_manual(name = "", values = cbPalette) +
theme(axis.text.y = element_text(size=16), axis.text.x = element_text(size=16),
axis.title.x = element_text(size=16), axis.title.y = element_text(size=16),
legend.text=element_text(size=14), legend.title=element_text(size=16),
legend.key = element_rect(size = 5),
legend.key.size = unit(3, 'lines'))
print(p)
outf = file.path(outdir_fig_main, paste0("Fig2C_qqplot_rastr_", gsub("\\/", "_", ab_tp), ".pdf"))
ggsave(outf, p, width = 6, height = 4)
}
[1] "twi/24"
CHT full
N=13542CHT AS
N=15790CHT BNB
N=501CHT permuted
N=91
Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
generated.
[1] "ctcf/68"
CHT full
N=12789CHT AS
N=13139CHT BNB
N=4788CHT permuted
N=255
[1] "mef2/68"
CHT full
N=7956CHT AS
N=9021CHT BNB
N=1780CHT permuted
N=40
[1] "mef2/1012"
CHT full
N=9889CHT AS
N=11548CHT BNB
N=2346CHT permuted
N=83
[1] "bin/68"
CHT full
N=5801CHT AS
N=6812CHT BNB
N=672CHT permuted
N=16
[1] "bin/1012"
CHT full
N=4253CHT AS
N=4719CHT BNB
N=398CHT permuted
N=12
# Number of peaks with variants per condition
cht %<>%
group_by(condition) %>%
mutate(n_peaks = length(unique(peak_id))) %>%
ungroup()
# Number of peaks affected by cis variation
cht_sel = cht %>%
filter(signif_strongAI) %>%
group_by(condition) %>%
mutate(n_AI_peaks = length(unique(peak_id)),
share_AI_peaks = n_AI_peaks / mean(n_peaks))
# Select closest top variant to peak summit (min p-value)
cht_sel %<>%
group_by(condition, peak_id) %>%
mutate(min_pval = min(P.VALUE)) %>% # select variants with min p-value
filter(P.VALUE == min_pval) %>%
mutate(min_dist2summit = min(dist2summit)) %>%
filter(dist2summit == min_dist2summit) %>%
mutate(bin_dist = cut(dist2summit, breaks = c(0, 250, 1250, 2500), labels = c("in peak", "<= 1kB", "> 1kB"), include.lowest = T))
df_sum = cht_sel %>%
group_by(condition, bin_dist, share_AI_peaks) %>%
summarise(n_bin_dist = n(), share_bin_dist = n_bin_dist / mean(n_AI_peaks), tot_AI = mean(n_AI_peaks))
`summarise()` has grouped output by 'condition', 'bin_dist'. You can override
using the `.groups` argument.
df_sum$label = factor(ab_tp_labels[df_sum$condition], levels = ab_tp_labels)
df_sum$bin_dist = factor(df_sum$bin_dist, levels = rev(levels(df_sum$bin_dist)))
label_df = df_sum %>%
group_by(label, share_AI_peaks) %>%
summarize(n_AI_peaks = sum(n_bin_dist), share_AI_peaks = round(share_AI_peaks, 2)) %>%
unique()
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
always returns an ungrouped data frame and adjust accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
generated.
`summarise()` has grouped output by 'label', 'share_AI_peaks'. You can override
using the `.groups` argument.
p = ggplot(df_sum, aes(x = label, y = n_bin_dist, fill = bin_dist)) +
geom_bar(stat = "identity", width = 0.6) +
scale_fill_manual(values = cbPalette, name = "Distance to summit\n(nearest top-sign. variant)") +
annotate("text", x = 1:6, y = label_df$n_AI_peaks + 30, label = label_df$share_AI_peaks, size = 5) +
xlab("") +
ylab("# of peaks") +
theme_bw() +
theme(axis.text.x = element_text(size=16, angle = 45, hjust = 1, color = TFcols), axis.text.y = element_text(size=16),
axis.title.x = element_text(size=18), axis.title.y = element_text(size=16),
legend.text=element_text(size=14), legend.title=element_text(size=14))
Warning: Vectorized input to `element_text()` is not officially supported.
ℹ Results may be unexpected or may change in future versions of ggplot2.
outf = file.path(outdir_fig_main, paste0("Fig2D_number_aff_peaks.pdf"))
ggsave(outf, p, width = 7, height = 5)
p
# CHT results
cht = lapply(ab_tp_list, function(ab_tp) load_cht_results(ab_tp)) %>% bind_rows()
cht_gr = cht %>% mutate(chr = TEST.SNP.CHROM, start = TEST.SNP.POS, end = TEST.SNP.POS) %>% GRanges()
# F1 ATAC-seq results
# coordinates of ATAC peaks
atac_regions_path = "/g/furlong/project/68_F1_cisreg_ichip/data/F1_paper_multiom/ATAC_feature_location_dm6.bed"
atac_regions = import(atac_regions_path, format = "bed")
# imbalance info for ATAC peaks
atac_ai_path = "/g/furlong/project/68_F1_cisreg_ichip/data/F1_paper_multiom/ATAC_all_peaks_atac_X.txt"
atac_ai = read.delim(atac_ai_path)
# additional annotations for allele imbalance
atac_ai_sum = atac_ai %>%
mutate(AI = padj < 0.01 & abs(0.5 - meanprop) > 0.1) %>%
group_by(feature, time) %>%
summarize(n_AI = sum(AI),
AI_peak = any(AI),
mean_AI = abs(0.5 - mean(meanprop)),
max_AI = max(abs(0.5 - meanprop)),
n_het_lines =n())
`summarise()` has grouped output by 'feature'. You can override using the
`.groups` argument.
# combine AII info with peak coordinates
atac_f1_df = merge(data.frame(atac_regions), atac_ai_sum, by.x = "name", by.y = "feature") %>% filter(!seqnames %in% c("chrX", "chrY"))
atac_ai_gr = GRanges(atac_f1_df)
# Do the analysis per TF and ATAC time-point
df = data.frame()
for (ab_tp in ab_tp_list) {
tp = gsub("-", "", timepoints[ab_tp])
# get TF peaks
AI_peaks = get_peaks_from_cht(ab_tp, cht)
names(mcols(AI_peaks)) = paste0(names(mcols(AI_peaks)), ".tf")
# get ATAC peaks for selected time-point
AI_atac = atac_ai_gr[atac_ai_gr$time == tp]
names(mcols(AI_atac)) = paste0(names(mcols(AI_atac)), ".atac")
ov = findOverlaps(AI_peaks, AI_atac)
res = cbind.data.frame(mcols(AI_peaks[queryHits(ov)]), mcols(AI_atac[subjectHits(ov)]))
print(ab_tp)
wt = wilcox.test(mean_AI.atac ~ AI_peak.tf, res, alternative = "less")
print(wt)
wt = wilcox.test(max_AI.atac ~ AI_peak.tf, res, alternative = "less")
print(wt)
df = rbind.data.frame(df, res)
}
`summarise()` has grouped output by 'peak_id', 'condition', 'chr', 'start'. You
can override using the `.groups` argument.
[1] "twi/24"
Wilcoxon rank sum test with continuity correction
data: mean_AI.atac by AI_peak.tf
W = 1408786, p-value <2e-16
alternative hypothesis: true location shift is less than 0
Wilcoxon rank sum test with continuity correction
data: max_AI.atac by AI_peak.tf
W = 1317170, p-value <2e-16
alternative hypothesis: true location shift is less than 0
`summarise()` has grouped output by 'peak_id', 'condition', 'chr', 'start'. You
can override using the `.groups` argument.
[1] "ctcf/68"
Wilcoxon rank sum test with continuity correction
data: mean_AI.atac by AI_peak.tf
W = 493598, p-value = 1.2e-11
alternative hypothesis: true location shift is less than 0
Wilcoxon rank sum test with continuity correction
data: max_AI.atac by AI_peak.tf
W = 444450, p-value <2e-16
alternative hypothesis: true location shift is less than 0
`summarise()` has grouped output by 'peak_id', 'condition', 'chr', 'start'. You
can override using the `.groups` argument.
[1] "mef2/68"
Wilcoxon rank sum test with continuity correction
data: mean_AI.atac by AI_peak.tf
W = 947570, p-value <2e-16
alternative hypothesis: true location shift is less than 0
Wilcoxon rank sum test with continuity correction
data: max_AI.atac by AI_peak.tf
W = 879208, p-value <2e-16
alternative hypothesis: true location shift is less than 0
`summarise()` has grouped output by 'peak_id', 'condition', 'chr', 'start'. You
can override using the `.groups` argument.
[1] "mef2/1012"
Wilcoxon rank sum test with continuity correction
data: mean_AI.atac by AI_peak.tf
W = 1304888, p-value <2e-16
alternative hypothesis: true location shift is less than 0
Wilcoxon rank sum test with continuity correction
data: max_AI.atac by AI_peak.tf
W = 1237730, p-value <2e-16
alternative hypothesis: true location shift is less than 0
`summarise()` has grouped output by 'peak_id', 'condition', 'chr', 'start'. You
can override using the `.groups` argument.
[1] "bin/68"
Wilcoxon rank sum test with continuity correction
data: mean_AI.atac by AI_peak.tf
W = 416654, p-value <2e-16
alternative hypothesis: true location shift is less than 0
Wilcoxon rank sum test with continuity correction
data: max_AI.atac by AI_peak.tf
W = 417385, p-value <2e-16
alternative hypothesis: true location shift is less than 0
`summarise()` has grouped output by 'peak_id', 'condition', 'chr', 'start'. You
can override using the `.groups` argument.
[1] "bin/1012"
Wilcoxon rank sum test with continuity correction
data: mean_AI.atac by AI_peak.tf
W = 397406, p-value = 3.2e-07
alternative hypothesis: true location shift is less than 0
Wilcoxon rank sum test with continuity correction
data: max_AI.atac by AI_peak.tf
W = 369466, p-value = 1.5e-11
alternative hypothesis: true location shift is less than 0
df %<>%
mutate(label = factor(ab_tp_labels[condition.tf], levels = ab_tp_labels))
df %>% group_by(condition.tf, AI_peak.tf) %>%
summarize(mean(mean_AI.atac, mean(max_AI.atac)))
`summarise()` has grouped output by 'condition.tf'. You can override using the
`.groups` argument.
p = ggplot(df, aes(x = AI_peak.tf, y = max_AI.atac)) +
geom_violin(fill = "darkblue", alpha = 0.3) +
geom_boxplot(width = 0.4, outlier.size = 0.1, fill = "darkblue", alpha = 0.7) +
theme_bw() +
stat_compare_means() +
xlab("AI TF peaks (F1)") +
ylab("Allele imbalance of ATAC peaks (F1)") +
theme(axis.text.y = element_text(size=12), axis.text.x = element_text(size=12),
axis.title.x = element_text(size=14), axis.title.y = element_text(size=14))
p
# only quantified peaks for CHT
peaks = lapply(ab_tp_list, function(ab_tp) get_peaks_from_cht(ab_tp, cht, as_granges = T))
`summarise()` has grouped output by 'peak_id', 'condition', 'chr', 'start'. You
can override using the `.groups` argument.
`summarise()` has grouped output by 'peak_id', 'condition', 'chr', 'start'. You
can override using the `.groups` argument.
`summarise()` has grouped output by 'peak_id', 'condition', 'chr', 'start'. You
can override using the `.groups` argument.
`summarise()` has grouped output by 'peak_id', 'condition', 'chr', 'start'. You
can override using the `.groups` argument.
`summarise()` has grouped output by 'peak_id', 'condition', 'chr', 'start'. You
can override using the `.groups` argument.
`summarise()` has grouped output by 'peak_id', 'condition', 'chr', 'start'. You
can override using the `.groups` argument.
names(peaks) = ab_tp_list
peaks = lapply(peaks, function(x) {ov = findOverlaps(x, promoters)
x$TSS = "TSS-distal"
x$TSS[unique(queryHits(ov))] = "TSS-proximal"
x})
peaks_df = lapply(peaks, function(x) as.data.frame(x)) %>% bind_rows()
fts = lapply(ab_tp_list, function(ab_tp) {fisher_test_two_groups(peaks_df, ab_tp,
group1 = "AI_peak", group1_val = c(TRUE, FALSE),
group2 = "TSS", group2_val = c("TSS-distal", "TSS-proximal"))}) %>%
bind_rows()
fts$label = ab_tp_labels
fts$label = factor(fts$label, levels = ab_tp_labels)
fts %<>% mutate(ratio_label = paste(round(r1, 2), round(r2, 2), sep = ":"))
print(fts)
condition comp1 comp2 n11 n12 n21
1 twi/24 AI_peak_TRUE_vs_FALSE TSS_TSS-distal_vs_TSS-proximal 563 280 2031
2 ctcf/68 AI_peak_TRUE_vs_FALSE TSS_TSS-distal_vs_TSS-proximal 344 237 854
3 mef2/68 AI_peak_TRUE_vs_FALSE TSS_TSS-distal_vs_TSS-proximal 342 220 1689
4 mef2/1012 AI_peak_TRUE_vs_FALSE TSS_TSS-distal_vs_TSS-proximal 466 219 2079
5 bin/68 AI_peak_TRUE_vs_FALSE TSS_TSS-distal_vs_TSS-proximal 229 155 1001
6 bin/1012 AI_peak_TRUE_vs_FALSE TSS_TSS-distal_vs_TSS-proximal 189 124 1164
n22 r1 r2 pval odds_ratio label ratio_label
1 2480 0.66785 0.45023 1.7999e-31 2.4548 Twi, 2-4h 0.67:0.45
2 1722 0.59208 0.33152 1.1049e-30 2.9257 CTCF, 6-8h 0.59:0.33
3 2497 0.60854 0.40349 4.9357e-20 2.2978 Mef2, 6-8h 0.61:0.4
4 2880 0.68029 0.41924 4.6753e-38 2.9471 Mef2, 10-12h 0.68:0.42
5 1999 0.59635 0.33367 9.5351e-23 2.9494 Bin, 6-8h 0.6:0.33
6 1884 0.60383 0.38189 5.0732e-14 2.4663 Bin, 10-12h 0.6:0.38
p = ggplot(fts, aes(x = label, y = odds_ratio)) +
geom_hline(yintercept = 1, color = "darkred") +
geom_bar(aes(fill = -log10(pval)), color = "darkblue", stat = "identity", position = "dodge", width = 0.5) +
#scale_fill_manual(name = "", values = c("darkblue", "darkgrey"), labels = c("AI peaks", "non-AI peaks")) +
geom_text(aes(label = round(r1, 2), x = label, y = odds_ratio + 0.1), data = fts, size = 6) +
ylab("Fisher's Test Odds Ratio") +
theme_bw() +
theme(axis.text.x = element_text(size = 16, angle = 45, hjust = 1, colour = TFcols),
axis.text.y = element_text(size = 14),
axis.title.x = element_blank(),
axis.title.y = element_text(size = 16),
legend.text = element_text(size=14),
legend.title = element_text(size=14))
Warning: Vectorized input to `element_text()` is not officially supported.
ℹ Results may be unexpected or may change in future versions of ggplot2.
outf = file.path(outdir_fig_main, paste0("Fig2F_tss_dist_prox_fisher.pdf"))
ggsave(outf, p, width = 5, height = 5)
res = lapply(c("TSS-distal", "TSS-proximal"), function(TSS_type) {
ps = lapply(peaks, function(x) {x[x$TSS == TSS_type]})
lapply(ab_tp_list, function(ab_tp) fisher_test_AIpeaks_cobinding(ab_tp, ps) %>%
mutate(TSS = TSS_type)) %>% bind_rows()}) %>%
bind_rows()
res$label = ab_tp_labels
res$label = factor(res$label, levels = ab_tp_labels)
print(res)
condition comp1 comp2 n11 n12 n21 n22
1 twi/24 isAI_TRUE_vs_FALSE is_co_binding_TRUE_vs_FALSE 216 347 927 1104
2 ctcf/68 isAI_TRUE_vs_FALSE is_co_binding_TRUE_vs_FALSE 133 211 479 375
3 mef2/68 isAI_TRUE_vs_FALSE is_co_binding_TRUE_vs_FALSE 190 152 1122 567
4 mef2/1012 isAI_TRUE_vs_FALSE is_co_binding_TRUE_vs_FALSE 220 246 1123 956
5 bin/68 isAI_TRUE_vs_FALSE is_co_binding_TRUE_vs_FALSE 137 92 720 281
6 bin/1012 isAI_TRUE_vs_FALSE is_co_binding_TRUE_vs_FALSE 104 85 748 416
7 twi/24 isAI_TRUE_vs_FALSE is_co_binding_TRUE_vs_FALSE 214 66 2137 343
8 ctcf/68 isAI_TRUE_vs_FALSE is_co_binding_TRUE_vs_FALSE 184 53 1616 106
9 mef2/68 isAI_TRUE_vs_FALSE is_co_binding_TRUE_vs_FALSE 173 47 2196 301
10 mef2/1012 isAI_TRUE_vs_FALSE is_co_binding_TRUE_vs_FALSE 152 67 2386 494
11 bin/68 isAI_TRUE_vs_FALSE is_co_binding_TRUE_vs_FALSE 134 21 1821 178
12 bin/1012 isAI_TRUE_vs_FALSE is_co_binding_TRUE_vs_FALSE 101 23 1697 187
r1 r2 pval odds_ratio TSS label
1 0.38366 0.45643 2.1353e-03 0.74142 TSS-distal Twi, 2-4h
2 0.38663 0.56089 5.1600e-08 0.49376 TSS-distal CTCF, 6-8h
3 0.55556 0.66430 1.5295e-04 0.63181 TSS-distal Mef2, 6-8h
4 0.47210 0.54016 8.8023e-03 0.76141 TSS-distal Mef2, 10-12h
5 0.59825 0.71928 4.4116e-04 0.58142 TSS-distal Bin, 6-8h
6 0.55026 0.64261 1.8332e-02 0.68069 TSS-distal Bin, 10-12h
7 0.76429 0.86169 3.9172e-05 0.52056 TSS-proximal Twi, 2-4h
8 0.77637 0.93844 1.0712e-13 0.22798 TSS-proximal CTCF, 6-8h
9 0.78636 0.87946 2.0268e-04 0.50467 TSS-proximal Mef2, 6-8h
10 0.69406 0.82847 2.7561e-06 0.46984 TSS-proximal Mef2, 10-12h
11 0.86452 0.91096 6.0911e-02 0.62389 TSS-proximal Bin, 6-8h
12 0.81452 0.90074 5.6340e-03 0.48412 TSS-proximal Bin, 10-12h
p = ggplot(res, aes(x = label, y = odds_ratio)) +
facet_wrap(~TSS) +
geom_hline(yintercept = 1, color = "darkred") +
geom_bar(aes(fill = -log10(pval)), color = "darkblue", stat = "identity", position = "dodge", width = 0.5) +
#scale_fill_manual(name = "", values = c("darkblue", "darkgrey"), labels = c("AI peaks", "non-AI peaks")) +
geom_text(aes(label = round(r1, 2), x = label, y = odds_ratio + 0.05), data = res, size = 6) +
ylab("Fisher's Test Odds Ratio") +
ylim(0,3) +
theme_bw() +
theme(axis.text.x = element_text(size = 16, angle = 45, hjust = 1, colour = TFcols),
axis.text.y = element_text(size = 14),
axis.title.x = element_blank(),
axis.title.y = element_text(size = 16),
legend.text = element_text(size=14),
legend.title = element_text(size=14))
Warning: Vectorized input to `element_text()` is not officially supported.
ℹ Results may be unexpected or may change in future versions of ggplot2.