Commit 4d15acfd by Bernd Klaus

### finished theoretical tSNE section, cleaned up MDS section

 ... ... @@ -1008,11 +1008,11 @@ the Euclidean distance. # Dimensionality reduction 2: Non--metric Kruskal scaling for single cell RNA--Seq data # Dimensionality reduction 2: Kruskal scaling for single cell RNA--Seq data Here, we use the cell--cycle corrected single cell RNA--seq data from [Buettner et. al.](http://dx.doi.org/10.1038/nbt.3102). The authors use a dataset studying the differentiation of T--cells. The authors found use a dataset studying the differentiation of T--cells. They found that the cell cycle had a profound impact on the gene expression for this data set and developed a factor analysis--type method (a regression--type model where the coefficients are random and follow a distribution) ... ... @@ -1037,13 +1037,15 @@ In other words, given distances $$d_{i,j}$$ between two cells, we want to find margin^[There are many variants of MDS, For a review see Buja et. al., 2007] Where theta is a monotone transformation of the input distances. This Where theta is a monotone transformation of the input distances. Allowing us to put represent the "typical range" of distance more faithfully. This goodness of fit can then be measured by the cost function known as __stress__. $\text{stress}(\hat d_{i,j}}) = \sqrt{\sum_{i \neq j} \frac{ (\theta( d_{i,j}) - \hat d_{i,j})^2}{\hat d_{i,j}^2}}$ \text{STRESS}_{Kruskal} = \sqrt{ \frac{ \sum_{i \neq j} [\theta( d_{i,j}) - \hat d_{i,j}]^2}{ \sum_{i \neq j} \hat d_{i,j}^2}} \] margin^[The stress denominator is only used for standardization and makes sure that the stress is between 0 and 1.] Where $$\hat d_{i,j}$$ are the distances fitted by the non--metric distance scaling algorithm. Our procedure is "non--metric" as we approximate ... ... @@ -1205,7 +1207,7 @@ ggplot(data_dist, aes(x = org_distance, y = mds_distance)) + ### Exercise: Metric--Sammon scaling ### Exercise: Sammon scaling [Sammon (1969)](https://en.wikipedia.org/wiki/Sammon_mapping) developed an alternaive __metric__ (distances fit directly) ... ... @@ -1213,12 +1215,16 @@ scaling algorithm which optimizes a different stress function: \[ E = \frac{1}{\sum\limits_{i. popular for single cell data [visNE](http://dx.doi.org/10.1038/nbt.2594), but challanging to choose the parameters In summary, while t-SNE has been used to reveal structure, especially the perplexity parameter is hard to set. Any results of a t-SNE analysis should thus be verified by an independent analysis (i.e. MDS or PCA). ## Infering cell hierarchies Single cell data is commonly used to infer (developmental) hierarchies of single cells. For a nice bioconductor package wrapping a lot of dimension reduction techniques for single cell data, see r Biocpkg("sincell")  and the [associated article](http://dx.doi.org/10.1093/bioinformatics/btv368). ... ...