Commit 2acedcef authored by Bernd Klaus's avatar Bernd Klaus

added regression section, TODO: LOESS, PCA, Clustering, Heatmap

parent 2994d0ab
......@@ -116,7 +116,7 @@ tra_detected <- filter(mtec_counts_tidy, is_detected == TRUE,
tra_detected
## ----tra_vs_all----------------------------------------------------------
## ----travsall, fig.cap="Total number of genes vs TRA"--------------------
scatter_tra <- ggplot(tra_detected, aes(x = total_detected, y = tra))+
geom_point() +
coord_equal()
......@@ -131,8 +131,13 @@ scatter_tra +
ggplot(tra_detected, aes(x = total_detected, y = tra))
scatter_tra +
geom_smooth(color = "coral3") +
geom_smooth(method = "lm")
## ----regresssion_tra-----------------------------------------------------
lm_tra <- lm(tra ~ total_detected, data = tra_detected)
lm_tra
## ----session_info, cache = FALSE-----------------------------------------
sessionInfo()
......@@ -152,7 +152,7 @@ load(file.path(data_dir, "tras.RData"))
# Graphics in R
# ggplot2
# ggplot2{#gg}
`r CRANpkg("ggplot2")` is a package by Hadley Wickham that implements the idea of
*grammar of graphics* -- a concept created by Leland Wilkinson in his book of the same name. Comprehensive documentation for the package can
......@@ -213,7 +213,7 @@ tra_detected
We visualize this relationship in a scatterplot with ggplot2.
```{r tra_vs_all}
```{r travsall, fig.cap="Total number of genes vs TRA"}
scatter_tra <- ggplot(tra_detected, aes(x = total_detected, y = tra))+
geom_point() +
coord_equal()
......@@ -251,7 +251,7 @@ We set (instead of mapped) the alpha value to 0.2, this increaseses the
transparancy of the rugs and alleviates overplotting.
# Exercise: Geometries and aspect ratios
## Exercise: Geometries and aspect ratios
1. What happens if you omit the calls to `geom_point()` and
`coord_equal()` ?
......@@ -265,9 +265,59 @@ to a regression line?
ggplot(tra_detected, aes(x = total_detected, y = tra))
scatter_tra +
geom_smooth(color = "coral3") +
geom_smooth(method = "lm")
```
As Figure \@ref(fig:travsall) shows the two numbers seem very correlated
and seem to follow a linear relation,
indicating that the number of TRA genes that are detected within a cell
is proportional to the number of detected genes. Regression models
allow for the systematic characterization of such relationships.
# Regression models
In regression we use one variable to explain or predict the other. It is customary
to plot the predictor variable on the x--axis and the predicted variable on the
y--axis.
The predictor is also called the independent variable, the explanatory variable,
the covariate, or simply x. The predicted variable is called the dependent
variable, or simply y.
In a regression problem the data are pairs \((x_i , y_i )\) for \(i = 1, \dotsc , n\).
For each \(x_i \), \(y_i\) is a random variable whose distribution depends on \(x_i \).
The regression model is:
\[
y_i = g(x_i) + \varepsilon_i .
\]
The above expresses \(y_i\) as a systematic part \(g(x_i)\)
and an unexplained part
$\varepsilon_i$. Or more informally:
__response = signal + noise__
\(g\) is called the regression function, and regression models provide means
to compute \(g\). A common choice for \(g\) is a linear function, if we have only
single predictor $X$, the simple linear regression model is:
\[
y_i = a + b x_{i} + \varepsilon_i; \quad
\varepsilon_i \sim N(0, \sigma);
\]
We can of course always add more predictors to the linear function. The coefficient
\(b\) is called the __slope__ and \(a\) is called the __intercept__ .
We can fit a linear regression via a call to the function `lm()`. The regression
model is specified using R's formula notation.
```{r regresssion_tra}
lm_tra <- lm(tra ~ total_detected, data = tra_detected)
lm_tra
```
# Session Info
......
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment