Commit 01de3a55 authored by Thomas Schwarzl's avatar Thomas Schwarzl

added statistical testing

parent a0a2339e
......@@ -172,7 +172,7 @@ Examples (details later)
# A sequence analysis package tour
![Alt Sequencing Ecosystem](our_figures/SequencingEcosystem.png)
![Alt Sequencing Ecosystem](our_figures/SequencingEcosystem.png){height=500}
......@@ -495,6 +495,8 @@ Functions and methods
![](our_figures/SE_Description.png)
[SummarizedExperiment][]
# Integrated data representations: _SummarizedExperiment_
- 'feature' x 'sample' `assays()`
- `colData()` data frame for desciption of samples
- `rowRanges()` _GRanges_ / _GRangeList_ or data frame for description
......
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -13,67 +13,154 @@ _"To consult the statistician after an experiment is finished is often merely to
# start with the analysis as soon as you have acquired the first data
Don???t wait until everything is collected and it???s too late to troubleshoot
Don't wait until everything is collected and it's too late to troubleshoot
# Start writing the paper while you???re analyzing the data
# Start writing the paper while you're analyzing the data
once you???re writing, you realize what you should have done to properly support them
once you're writing, you realize what you should have done to properly support them
# Types of experiments
- measurements have limited precision and accuracy
-- preliminary data to estimate them
- preliminary data to estimate them
- directly or indirectly measurement
- side effects of treatment conditions
- interfering signals or ???background noise???
- interfering signals or "background noise"
- limited sample sizes
# signal / noise ratio
_???Generally speaking, a well-designed experiment is one that is sufficiently powered and one in which technical artifacts and biological features that may systematically affect measurements are balanced, randomized or controlled in some other way in order to minimize opportunities for multiple explanations for the effect(s) under study.???_ - Bacher and Kendziorski 2016
# controlled experiment
# Types of experiments
### controlled experiment
- (model) system under study
- the environmental conditions
- the experimental readout.
_e.g. we could have a well-characterized cell line growing in laboratory conditions on defined media, temperature and atmosphere, we???ll administer a precise amount of a drug, and after 72h we measure the activity of a specific pathway reporter._
_e.g. we could have a well-characterized cell line growing in laboratory conditions on defined media, temperature and atmosphere, we will administer a precise amount of a drug, and after 72h we measure the activity of a specific pathway reporter._
# study
# Types of experiments
### study
important conditions that may affect the measured outcome are not under control of the researcher, usually because of ethical concerns or logistical constraints.
_e.g. in an ecological field study, this could be the weather, the availabilty of nutrition resources or the activity of predators_
# observational study
# Types of experiments
### observational study
_e.g. in a clinical trial, this might be the assignment of the individual subjects to groups. Since there are many possibilities for confounding _
???correlation is not causation???
correlation is not causation!
# Types of experiments
### signal / noise ratio
_"Generally speaking, a well-designed experiment is one that is sufficiently powered and one in which technical artifacts and biological features that may systematically affect measurements are balanced, randomized or controlled in some other way in order to minimize opportunities for multiple explanations for the effect(s) under study."_ - Bacher and Kendziorski 2016
```{r}
# Bias and noise
```
<center>
![](img/TargetVariance.png){height=150}
noise: "averages out" if we just perform enough replicates
```{r}
![](img/TargetBias.png){height=150}
bias: remains, becomes more apparent with enough replicates
</center>
```
# Confounding factors / Batch effects
<center>
![](img/chap10-chap10-r-confounding-1-1.png){height=500}
</center>
# Confounding factors / Batch effects
![](img/batcheffect.png){height=500}
# Batch effect vs confounding
Confounding need not only be between a biological and a technical variable, it can also be more subtle. For instance, the biomarker might have nothing to do with the disease directly – it might just be a marker of a life style that causes the disease (as well as other things), or of an inflammation that is caused by the disease (as well as by many other things), etc.
# Effect size and replicates
<center>
![](img/chap10-Design-effectsize-1.png){height=500}
</center>
# Blockbox
Block what you can, randomize what you cannot.
(George Box, 1978)
![](img/chap10-Design-blockbox-1.png){height=400}
# Replicates
![Figure 13.2 from Book](img/chap10-Design-comparesamplesize-1.png)
# Biological Replicates vs technical replicates
- A person is weighed on milligram precision scales, with 20 replicates. He follows the diet, and four weeks later, he is weighed again, with 20 replicates.
```{r}
- Ten people weigh themselves once on their bathroom scales and report the number. Four weeks later, they weigh themselves and report again.
# How many replicates do I need
The package pwr
*pwr.2p.test*, *pwr.chisq.test*, *pwr.f2.test*
```{r, eval = F}
library("pwr")
str(pwr.t.test)
```
If you call the function with a value for power and effect size, it will return the sample size needed, or if you specify the sample size and effect size, it returns the power.
```{r, eval=F}
pwr.t.test(n = 15, d = 0.4, sig.level = 0.05, type = "paired")
```
d is effect size (Cohen's d) - difference between the means divided by the pooled standard deviation
Test | small | medium | large
-------|-------|--------|-------------
tests for proportions (p) | 0.2 | 0.5 | 0.8
tests for means (t) | 0.2 | 0.5 | 0.8
chi-square tests (chisq) | 0.1 | 0.3 | 0.5
correlation test (r) | 0.1 | 0.3 | 0.5
anova (anov) | 0.1 | 0.25 | 0.4
general linear model | (f2) 0.02 | 0.15 | 0.35
# Fold-changes
- fold changes and proportions are ratios.
- denominator is a random variable (as it changes from lab to lab and probably from experiment to experiment), which can create high instability and very unequal variances between experiments
- *transformations!*
# Regular and catastrophic noise
- Regular noise can be modelled by simple probability models such as
- independent normal distributions
- Poisson
- or mixtures such as gamma–Poisson or Laplace.
- to take such noise into account in our data analyses and to compute the probability of extraordinarily large or small values.
In the real world, this is only part of the story: measurements can be completely off scale (a sample swap, a contamination or a software bug), and they can go awry all at the same time (a whole microtiter plate went bad, affecting all data measured from it). Such events are hard to model or even correct for – our best chance to deal with them is data quality assessment, outlier detection and documented removal.
# Mean-variance relationships and variance-stabilizing transformations
etc
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment