Commit 5d90876f authored by Bernd Klaus's avatar Bernd Klaus

small edits to the testing etc. lab

parent b8384062
......@@ -355,7 +355,7 @@ anova(lm(as.vector(sva_res$sv) ~ factors_rld_batch$gen))
## ----svdesign------------------------------------------------------------
colData(dds_batch)$sv <- as.vector(sva_res$sv)
design(dds_batch) <- ~ sv + condition
design(dds_batch) <- ~ sv + sex + condition
dds_batch_sva <- DESeq(dds_batch)
res_sva <- results(dds_batch_sva)
summary(res_sva)
......
......@@ -615,6 +615,10 @@ factor analysis. Factor analysis is often motivated as
the term factor analysis is general used for models which include
sums of random variables.
For advanced factor analysis methods in 'omics research see:
@Buettner_2017 and @Argelaguet_2017.
## Factor analysis to tackle batch effects
We can now try factor analysis on our data set: We will use the
......@@ -846,7 +850,7 @@ we have:
```{r svdesign}
colData(dds_batch)$sv <- as.vector(sva_res$sv)
design(dds_batch) <- ~ sv + condition
design(dds_batch) <- ~ sv + sex + condition
dds_batch_sva <- DESeq(dds_batch)
res_sva <- results(dds_batch_sva)
summary(res_sva)
......@@ -864,7 +868,7 @@ though that sva can actually identify the genetic background from the data.
the influence of the cell cycle from the gene expression data. Briefly,
@Buettner_2015 perform a factor analysis--like algorithm to estimate
a latent factor on gene annotated to cell--cycle. Here, we want to use
the method of @Risso_2017 instead.
the method of @Risso_2018 instead.
They propose a zero--inflated Negative Binomial Model for (sc--) RNA-Seq
data implemented in the package `r Biocpkg("zinbwave")`. In contrast to
......@@ -1208,7 +1212,7 @@ Traditionally, wave-like patterns in PC maps have been interpreted as
migration events. However, as @Novembre_2008 show, these patterns
arise naturally as soon as genetic similarity decays with distance.
For a comprehensive trajectory workflow using the `r Biocpkg("zinbwave")` and `r Biocpkg("clusterExperiment") `, see the paper by @Perraudeau_2017.
For a comprehensive trajectory workflow using the `r Biocpkg("zinbwave")` and `r Biocpkg("clusterExperiment") ` packages, see the paper by @Perraudeau_2017.
# Checking the clustering using machine learning
......@@ -1370,9 +1374,9 @@ predfun_rf(data_for_cl[train_idx,], genes_clusters$labs[train_idx],
The error rate estimate for cluster 10 is considerably lower than the out of bag
error. Let's see whether we can confirm this via cross validation. We split
the data set repeatedly into F = 8 folds, and use each of theses folds once for
the data set repeatedly into K = 10 folds, and use each of theses folds once for
prediction (and the others for training). The number of repeats is B = 10 times,
giving us 80 estimates of the two error rates in total.
giving us 100 estimates of the two error rates in total.
In general, choosing a low number of folds will increase the bias, while
a large number of repetitions will decrease the variability of the estimate.
......@@ -1403,9 +1407,9 @@ cv_plot
We can see that the error rate estimates are highly variable for cluster 10,
it seems that some genes in it are harder to predict than others. This might be
in line with Figure 1b of the original paper, which indicates that the
in line with Figure 2b of the original paper, which indicates that the
wihtin--cluster correlations are potentially largely driven by just a small
number of single cells: If they are selected by the decision tree,
number of single cells: If they are in the training and test set,
the prediction works well,
otherwise it does not.
......
This diff is collapsed.
......@@ -38,7 +38,7 @@ citep("10.1198/106186008X318440")
citep("10.1007/BF02289565")
citep("10.1101/125112")
#citep("10.1101/125112")
# Perraudeau et. al. , 2017
citep("10.12688/f1000research.12122.1")
......@@ -51,7 +51,7 @@ citep("10.1093/biostatistics/kxv027")
citep("10.1186/s12859-015-0808-5")
# Julia et. al. 2015
citep("10.1093/bioinformatics/btv368")
#citep("10.1093/bioinformatics/btv368")
# Novembre and Stephens, 2008
citep("10.1038/ng.139")
......@@ -90,6 +90,10 @@ citep("10.1186/s13059-015-0844-5")
# journal = {bioRxiv}
# }")
# Buettner et. al. 2017
citep("10.1186/s13059-017-1334-8")
### export citations
......@@ -162,3 +166,34 @@ add_manually("@article {Delgado_14,
pages = {3133-3181},
url = {http://jmlr.org/papers/v15/delgado14a.html}
}")
# Argelaguet et. al. 2017
add_manually("@article {Argelaguet_2017,
author = {Argelaguet, Ricard and Velten, Britta and Arnol, Damien and Dietrich, Sascha and Zenz, Thorsten and Marioni, John C. and Buettner, Florian and Huber, Wolfgang and Stegle, Oliver},
title = {Multi-Omics factor analysis disentangles heterogeneity in blood cancer},
year = {2017},
doi = {10.1101/217554},
publisher = {Cold Spring Harbor Laboratory},
abstract = {Multi-omic studies in large cohorts promise to characterize biological processes across molecular layers including genome, transcriptome, epigenome, proteome and perturbation phenotypes. However, methods for integrating multi-omic datasets are lacking. We present Multi-Omics Factor Analysis (MOFA), an unsupervised dimensionality reduction method for discovering the driving sources of variation in multi-omics data. Our model infers a set of (hidden) factors that capture biological and technical sources of variability across data modalities. We applied MOFA to data from 200 patient samples of chronic lymphocytic leukemia (CLL) profiled for somatic mutations, RNA expression, DNA methylation and ex-vivo responses to a panel of drugs. MOFA automatically discovered the known dimensions of disease heterogeneity, including immunoglobulin heavy chain variable region (IGHV) status and trisomy of chromosome 12, as well as previously underappreciated drivers of variation, such as response to oxidative stress. These factors capture key dimensions of patient heterogeneity, including those linked to clinical outcomes. Finally, MOFA handles missing data modalities in subsets of samples, enabling imputation, and the model can identify outlier samples.},
URL = {https://www.biorxiv.org/content/early/2017/11/10/217554},
eprint = {https://www.biorxiv.org/content/early/2017/11/10/217554.full.pdf},
journal = {bioRxiv}
}")
# Jula et. al., 2015
add_manually("
@Article{Juli__2015,
doi = {10.1093/bioinformatics/btv368},
url = {https://doi.org/10.1093/bioinformatics/btv368},
year = {2015},
month = {jun},
publisher = {Oxford University Press ({OUP})},
volume = {31},
number = {20},
pages = {3380--3382},
author = {Miguel Julia and Amalio Telenti and Antonio Rausell},
title = {Sincell: an R/Bioconductor package for statistical assessment of cell-state hierarchies from single-cell {RNA}-seq: Fig. 1.},
journal = {Bioinformatics},
}")
......@@ -54,6 +54,19 @@
journal = {Nature Biotechnology},
}
@Article{Buettner_2017,
doi = {10.1186/s13059-017-1334-8},
url = {https://doi.org/10.1186/s13059-017-1334-8},
year = {2017},
month = {nov},
publisher = {Springer Nature},
volume = {18},
number = {1},
author = {Florian Buettner and Naruemon Pratanwanich and Davis J. McCarthy and John C. Marioni and Oliver Stegle},
title = {f-{scLVM}: scalable and versatile factor analysis for single-cell {RNA}-seq},
journal = {Genome Biology},
}
@Article{Buja_2008,
doi = {10.1198/106186008x318440},
url = {https://doi.org/10.1198/106186008x318440},
......@@ -108,20 +121,6 @@
journal = {{BMC} Bioinformatics},
}
@Article{Juli__2015,
doi = {10.1093/bioinformatics/btv368},
url = {https://doi.org/10.1093/bioinformatics/btv368},
year = {2015},
month = {jun},
publisher = {Oxford University Press ({OUP})},
volume = {31},
number = {20},
pages = {3380--3382},
author = {Miguel Jul {\a'a} and Amalio Telenti and Antonio Rausell},
title = {Sincell: an R/Bioconductor package for statistical assessment of cell-state hierarchies from single-cell {RNA}-seq: Fig. 1.},
journal = {Bioinformatics},
}
@Article{Kruskal_1964,
doi = {10.1007/bf02289565},
url = {https://doi.org/10.1007/bf02289565},
......@@ -344,3 +343,36 @@
}
@article {Argelaguet_2017,
author = {Argelaguet, Ricard and Velten, Britta and Arnol, Damien and Dietrich, Sascha and Zenz, Thorsten and Marioni, John C. and Buettner, Florian and Huber, Wolfgang and Stegle, Oliver},
title = {Multi-Omics factor analysis disentangles heterogeneity in blood cancer},
year = {2017},
doi = {10.1101/217554},
publisher = {Cold Spring Harbor Laboratory},
abstract = {Multi-omic studies in large cohorts promise to characterize biological processes across molecular layers including genome, transcriptome, epigenome, proteome and perturbation phenotypes. However, methods for integrating multi-omic datasets are lacking. We present Multi-Omics Factor Analysis (MOFA), an unsupervised dimensionality reduction method for discovering the driving sources of variation in multi-omics data. Our model infers a set of (hidden) factors that capture biological and technical sources of variability across data modalities. We applied MOFA to data from 200 patient samples of chronic lymphocytic leukemia (CLL) profiled for somatic mutations, RNA expression, DNA methylation and ex-vivo responses to a panel of drugs. MOFA automatically discovered the known dimensions of disease heterogeneity, including immunoglobulin heavy chain variable region (IGHV) status and trisomy of chromosome 12, as well as previously underappreciated drivers of variation, such as response to oxidative stress. These factors capture key dimensions of patient heterogeneity, including those linked to clinical outcomes. Finally, MOFA handles missing data modalities in subsets of samples, enabling imputation, and the model can identify outlier samples.},
URL = {https://www.biorxiv.org/content/early/2017/11/10/217554},
eprint = {https://www.biorxiv.org/content/early/2017/11/10/217554.full.pdf},
journal = {bioRxiv}
}
@Article{Juli__2015,
doi = {10.1093/bioinformatics/btv368},
url = {https://doi.org/10.1093/bioinformatics/btv368},
year = {2015},
month = {jun},
publisher = {Oxford University Press ({OUP})},
volume = {31},
number = {20},
pages = {3380--3382},
author = {Miguel Julia and Amalio Telenti and Antonio Rausell},
title = {Sincell: an R/Bioconductor package for statistical assessment of cell-state hierarchies from single-cell {RNA}-seq: Fig. 1.},
journal = {Bioinformatics},
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment