Commit 92831134 authored by Bernd Klaus's avatar Bernd Klaus

added LOESS example and outline for the normalization section

parent 2acedcef
......@@ -36,6 +36,9 @@ library("limma")
library("Single.mTEC.Transcriptomes")
library("DESeq2")
library("tibble")
library("broom")
library("scran")
library("locfit")
theme_set(theme_gray(base_size = 18))
......@@ -138,6 +141,36 @@ scatter_tra +
lm_tra <- lm(tra ~ total_detected, data = tra_detected)
lm_tra
## ----loessExampleLinFit, dependson="fit_model"---------------------------
# create_data
y <- seq(from=1, to=10, length.out=100)
a <- y^3 +y^2 + rnorm(100,mean=0, sd=30)
dataL <- data.frame(a=a, y=y)
qplot(y, a, data = dataL)
# linear fit
linreg <- lm(a~y, data = dataL)
(qplot(y, a, data = dataL) +
geom_abline(slope = tidy(linreg, quick = TRUE)[2,2],
intercept = tidy(linreg, quick = TRUE)[1,2]))
tidy(linreg)
dataL$LinReg <- predict(linreg)
## ----loessExampleFit, dependson="loessExampleLinFit"---------------------
dataL$locFit <- predict(locfit(y~lp(a, nn=0.5, deg=1), data=dataL),
newdata = dataL$a)
(qplot(a, y, data = dataL, main = "Linear vs. local regression")
+ geom_line(aes(x = a, y = locFit), color = "dodgerblue3")
+ geom_line(aes(x = a, y = LinReg), color = "coral3"))
## ----session_info, cache = FALSE-----------------------------------------
sessionInfo()
......@@ -64,6 +64,9 @@ library("limma")
library("Single.mTEC.Transcriptomes")
library("DESeq2")
library("tibble")
library("broom")
library("scran")
library("locfit")
theme_set(theme_gray(base_size = 18))
......@@ -311,13 +314,89 @@ We can of course always add more predictors to the linear function. The coeffici
\(b\) is called the __slope__ and \(a\) is called the __intercept__ .
We can fit a linear regression via a call to the function `lm()`. The regression
model is specified using R's formula notation.
model is specified using R's formula notation.
```{r regresssion_tra}
lm_tra <- lm(tra ~ total_detected, data = tra_detected)
lm_tra
```
As we can see, the estimated
slope is ~ `r tidy(lm_tra, quick = TRUE)[2, "estimate"]`, indicating
that we have a proportion of `r tidy(lm_tra, quick = TRUE)[2, "estimate"]`
tra expressing genes on average per cell for the highly variable genes.
This is in line with the supplementary figure 1 of the original publication,
which uses the full set of expressed genes, not just the highly variable
ones.
# Local regression (LOESS)
Local regression is a commonly used approach for fitting flexible non--linear
functions, which involves computing many local linear regression fits and combining
them. Local regression is a very useful technique both for data visualization and
trend fitting. Fitting many local models requires quite some computational power,
but it usually feasible with today's hardware. We illustrate the local regression
using the `r CRANpkg("locfit") ` package on simulated data.
We first fit a linear regression line to simulated
data that follows a polynomial trend and see that it does not really
fit well.
```{r loessExampleLinFit, dependson="fit_model"}
# create_data
y <- seq(from=1, to=10, length.out=100)
a <- y^3 +y^2 + rnorm(100,mean=0, sd=30)
dataL <- data.frame(a=a, y=y)
qplot(y, a, data = dataL)
# linear fit
linreg <- lm(a~y, data = dataL)
(qplot(y, a, data = dataL) +
geom_abline(slope = tidy(linreg, quick = TRUE)[2,2],
intercept = tidy(linreg, quick = TRUE)[1,2]))
tidy(linreg)
dataL$LinReg <- predict(linreg)
```
We now use the function `locfit` to perform a local regression on the
data. It takes the predictors wrapped in a call to `lp()`. Within this
function we can also set tunning parameters. An important one is the `nn`
one, which set the proportion of nearest--neighbors to be used for the local fits.
The lower this percentage, the more closely the line will follow the data points.
```{r loessExampleFit, dependson="loessExampleLinFit"}
dataL$locFit <- predict(locfit(y~lp(a, nn=0.5, deg=1), data=dataL),
newdata = dataL$a)
(qplot(a, y, data = dataL, main = "Linear vs. local regression")
+ geom_line(aes(x = a, y = locFit), color = "dodgerblue3")
+ geom_line(aes(x = a, y = LinReg), color = "coral3"))
```
# Normalization and confounding factors.
## Normalization of single cell data
* use scran size factors
## Confounding factors
* ZINBA Wave
# Session Info
......
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment