Commit 72aecb63 authored by Bernd Klaus's avatar Bernd Klaus

added final version of the sample boxplot

parent 18746b2a
......@@ -187,7 +187,7 @@ Then, we join the annotation to the tidy table
```{r tidy_count}
mtec_counts_tidy <- gather(mtec_counts, key = "cell_id", value = "count",
-ensembl_id) %>%
mutate(is_tra = ensembl_id %in% tras$gene.ids,
dplyr::mutate(is_tra = ensembl_id %in% tras$gene.ids,
is_detected = count > 0) %>%
left_join(mtec_cell_anno,
by = c("cell_id" = "cellID"))
......@@ -203,13 +203,13 @@ we go from genes within a single cell as our unit to single cells.
```{r tra_per_cell}
tra_detected <- filter(mtec_counts_tidy, is_detected == TRUE,
tra_detected <- dplyr::filter(mtec_counts_tidy, is_detected == TRUE,
SurfaceMarker == "None") %>%
mutate(is_tra = ifelse(is_tra, "tra", "not_a_tra")) %>%
dplyr::mutate(is_tra = ifelse(is_tra, "tra", "not_a_tra")) %>%
group_by(cell_id, is_tra) %>%
tally() %>%
spread(key = is_tra, value = n) %>%
mutate(total_detected = sum(tra, not_a_tra))
dplyr::mutate(total_detected = sum(tra, not_a_tra))
tra_detected
```
......@@ -467,7 +467,7 @@ The annotation for the genes looks like this:
```{r pre_crc_genes}
colnames(colData(crc_data))
tail(colnames(colData(crc_data)))
colData(crc_data)[1:5, c("title", "mapped_read_count")]
nrow(colData(crc_data))
```
......@@ -481,37 +481,66 @@ as "mRNA". We will now create a column data table that contains the appropriate
sample annotation.
Specifically, we have to extract the sample annotations from
the title column and subset on the samples processed using mRNA
based quantification.
the `title` column and subset on the samples processed using mRNA
based quantification. This can be done using [regular expressions](http://www.zytrax.com/tech/web/regex.htm) and
the function `extract` from `r CRANpkg("tidyr")`.
```{r create_crc_col_data}
col_data_crc <- select(as.data.frame(colData(crc_data)),
title, characteristics, mapped_read_count) %>%
title, mapped_read_count) %>%
rownames_to_column(var = "sample_id") %>%
as_tibble() %>%
tidyr::extract(title, into = c("quantification", "patient", "tissue"),
regex = "([[:alnum:]]+)_([[:alnum:]]+)_([[:alnum:]]+)") %>%
dplyr::filter(quantification == "mRNA")
dplyr::filter(quantification == "mRNA")
```
We now plot the log counts of the mRNA samples to see whether their distributions
are comparable. In order to avoid taking the log of zero, we only retain
a gene within a sample that has at least one count
We now plot the log counts of the mRNA samples via a boxplot to see whether #
their distributions are comparable. We order the boxplot
by the sample median in order to see whether
there are huge deviations.
We compute the sample--wise medians by a grouping operation on the tidy counts
and then join them to the original table.
We also create a factor from the sample id that has the correct ordering
for the levels and join the col tdata of the samples to the tidy table.
```{r mrna_counts}
counts_crc_tidy <- rownames_to_column(data.frame(counts_crc), var = "ensembl_id") %>%
as_tibble() %>%
gather(key = "sample_id", value = "count", -ensembl_id) %>%
dplyr::filter(sample_id %in% col_data_crc$sample_id) %>%
dplyr::filter(count > 1)
ggplot(counts_crc_tidy, aes(x = sample_id, y = log2(count), fill = sample_id) ) +
geom_boxplot()
dplyr::filter(sample_id %in% col_data_crc$sample_id)
sample_medians <- group_by(counts_crc_tidy, sample_id) %>%
dplyr::filter(count > 0) %>%
summarize(sample_median = median(log2(count)))
counts_crc_tidy <- left_join(counts_crc_tidy, sample_medians,
by = c("sample_id" = "sample_id")) %>%
dplyr::arrange(sample_median) %>%
dplyr::mutate(sample_id_by_median = as_factor(sample_id)) %>%
left_join(col_data_crc)
count_boxplot <- ggplot(counts_crc_tidy,
aes(x = sample_id_by_median,
y = log2(count),
fill = sample_id) ) +
geom_boxplot() +
ylim(c(0, 10)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
```
We can see that the count distributions are not very different between the
samples. Here, limited the y--axis in order to ignore the outliers and tilted
the x--Axis labels, so that we can actually read them.
## Exercise:
<!-- the median gene expression per sample relative to a reference -->
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment