Commit 4ab56964 authored by Bernd Klaus's avatar Bernd Klaus
Browse files

small edits and extensions

parent 27926c0b
......@@ -28,7 +28,7 @@ cache = TRUE)
1. Starting point: Raw data in various formats
2. Import, reshape the data into required formats
3. Perfom computations on your data, plot it
3. Perform computations on your data, plot it
* We will use of the [tidyverse](http://tidyverse.org/): a set of packages that make using R easier
......@@ -108,33 +108,39 @@ z <- x + y
z
```
* The most basic elementary data structure in R are vectors:
* The most basic elementary data structure in R are vectors
* can be created by concatenating individual elements via the **c** function
```{r vectors}
x <- c(7.5, 8.2, 3.1, 5.6, 8.2)
```
* can be created by concatenating individual elements via the **c** function
--------
* Subsets are created by the bracket operator (1-based counting!)
* The indices of the elements you want to access are given inside the square brackets
* __head__ gives the first 6 elements of a data structure
```{r vector access}
head(x)
x[c(1, 2, 4)]
x[-(1:3)]
head(x)
```
## Matrices in R
Matrices are two--dimensional vectors, the simplest is to create the columns
and then glue them together with the command __cbind__
* Matrices are two--dimensional vectors
* the simplest is to create the columns and then glue them together with the command __cbind__
```{r cbind-ex, echo = TRUE}
x <- c(5, 7 , 9)
y <- c(6, 3 , 4)
x <- c(5, 7, 9)
y <- c(6, 3, 4)
z <- cbind(x, y)
z
dim(z)
......@@ -142,12 +148,9 @@ dim(z)
-----
* Access is now two--dimensional
* Access is now works by specifying indices for rows and columns
```{r cbind-ex_acces, echo = TRUE}
x <- c(5, 7 , 9)
y <- c(6, 3 , 4)
z <- cbind(x, y)
z[c(1,2), ]
z[, -1]
z[2, ]
......@@ -157,7 +160,7 @@ z[2, ]
## Data frames (tibbles) and lists
* A data frame is a matrix where the columns can have different data types
* A data frame is a matrix where the __columns can have different data types__
* Rows represent the samples and columns the variables
......@@ -165,7 +168,8 @@ z[2, ]
---
* example: import a small csv table
* import a small csv table with __read_csv__ from the __readr__ package
```{r load-Patients, echo = TRUE}
pat <- read_csv("http://www-huber.embl.de/users/klaus/BasicR/Patients.csv")
......@@ -185,7 +189,7 @@ select(pat_tiny, PatientId, Height, Gender)
There are a couple of operators useful for comparisons:
* `Variable == value`: equal
* `Variable != value`: un--equal
* `Variable != value`: unequal
* `Variable < value`: less
* `Variable > value`: greater
* `&: and`
......@@ -193,6 +197,16 @@ There are a couple of operators useful for comparisons:
* `!`: negation
* `%in%`: is element?
----
* You can also access data frames via indices or row / column names
```{r df_access_old}
pat[2, c("PatientId", "Height")]
pat["P2", c(1, 2)]
```
## Vectors with arbitrary contents: Lists
......@@ -205,7 +219,9 @@ L
------------------
* access via the double bracket operator
* access works via the double bracket operator
* recursively accesses the list contents
<http://r4ds.had.co.nz/vectors.html#visualising-lists>
......@@ -218,7 +234,7 @@ L[["two"]]
---
* data frames = special lists;
* data frames = special lists
=> can be accessed in the same way
......@@ -242,7 +258,7 @@ bodyfat
-----
* the function **map** from the **purrr** package applies another function to every element of a list
* the function **map** from the **purrr** package applies another function to every element of a list or vector
* The following code computes the mean value for every variable
......@@ -253,19 +269,22 @@ head(map_dbl(bodyfat, mean))
## Custom functions
* The map functions are really useful for applying your custom functions
* The map functions are useful for applying your custom functions
* function template:
* function template for R:
```{r function_template, eval=FALSE}
function_name <- function(argument_1, argument_2,
optional_argument = defautl_value )
optional_argument = default_value )
{
return(...)
}
```
* the return statement is optional, by default the result of the last computation
is returned
---
* let's compute a robust z--score for every variable
......@@ -280,7 +299,7 @@ map_df(bodyfat, robust_z)
---
* you can define a function implicitly via the **.x** and **.y** arguments
* you can define a function implicitly via the __.x__ and __.y__ arguments
```{r robust_z_implicit}
map_df(bodyfat, ~ (.x - median(.x)) / mad(.x))
......@@ -305,8 +324,8 @@ select(bodyfat, height, height_m, weight, weight_kg)
## Simple plotting in R: "qplot" of ggplot2
* The package **ggplot2** allows very flexible plotting in R
* it takes a while to get acquainted with the underlying "grammer of graphics"
* we will use its function **`qplot()`** for "quick plotting"
* however, it takes a while to get acquainted with the underlying "grammar of graphics"
* we will introduce its function **`qplot()`** for "quick plotting"
```{r qplot, eval=FALSE}
......@@ -335,7 +354,8 @@ include point, line, boxplot, histogram etc.
## A qplot examples using the bodyfat data
* plot of **perc.fat** against abdomen circumference
* plot of **perc.fat** against __abdomen.circum__
* we use **cut** to bin the weight data
```{r qplot_example, fig.show='hide'}
bodyfat <- mutate(bodyfat, weight_binned = cut(weight_kg, 5))
......@@ -354,11 +374,9 @@ qplot(abdomen.circum, percent.fat,
* The same data plotted using facets
```{r qplot_example_facets, fig.show='hide'}
qplot(abdomen.circum, percent.fat,
color = weight_binned, data = bodyfat,
facets = ~weight_binned)
```
<img src="Slides_tidyverse_R_intro_files/figure-slidy/qplot_example_facets-1.png" alt="Mountain View" height="75%" width="75%">
......@@ -399,6 +417,6 @@ Note however, that you should typically resort to `map` function for this
purpose as this leads to more readable code:
```{r maps_for}
map_dbl(h, ~.x*10)
map_dbl(h, ~.x * 10)
```
......@@ -65,7 +65,7 @@ Bernd Klaus
<ol style="list-style-type: decimal">
<li>Starting point: Raw data in various formats</li>
<li>Import, reshape the data into required formats</li>
<li>Perfom computations on your data, plot it</li>
<li>Perform computations on your data, plot it</li>
</ol>
<ul>
<li><p>We will use of the <a href="http://tidyverse.org/">tidyverse</a>: a set of packages that make using R easier</p></li>
......@@ -124,30 +124,33 @@ z &lt;-<span class="st"> </span>x +<span class="st"> </span>y
z</code></pre></div>
<pre><code> [1] 10</code></pre>
<ul>
<li>The most basic elementary data structure in R are vectors:</li>
<li><p>The most basic elementary data structure in R are vectors</p></li>
<li><p>can be created by concatenating individual elements via the <strong>c</strong> function</p></li>
</ul>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="fl">7.5</span>, <span class="fl">8.2</span>, <span class="fl">3.1</span>, <span class="fl">5.6</span>, <span class="fl">8.2</span>)</code></pre></div>
<ul>
<li>can be created by concatenating individual elements via the <strong>c</strong> function</li>
</ul>
</div>
<div class="slide section level2">
 
<ul>
<li>Subsets are created by the bracket operator (1-based counting!)</li>
<li><p>Subsets are created by the bracket operator (1-based counting!)</p></li>
<li><p>The indices of the elements you want to access are given inside the square brackets</p></li>
<li><p><strong>head</strong> gives the first 6 elements of a data structure</p></li>
</ul>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">head</span>(x)</code></pre></div>
<pre><code> [1] 7.5 8.2 3.1 5.6 8.2</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x[<span class="kw">c</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">4</span>)]</code></pre></div>
<pre><code> [1] 7.5 8.2 5.6</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x[-(<span class="dv">1</span>:<span class="dv">3</span>)]</code></pre></div>
<pre><code> [1] 5.6 8.2</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">head</span>(x)</code></pre></div>
<pre><code> [1] 7.5 8.2 3.1 5.6 8.2</code></pre>
</div>
<div id="matrices-in-r" class="slide section level2">
<h1>Matrices in R</h1>
<p>Matrices are two–dimensional vectors, the simplest is to create the columns and then glue them together with the command <strong>cbind</strong></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="dv">5</span>, <span class="dv">7</span> , <span class="dv">9</span>)
y &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="dv">6</span>, <span class="dv">3</span> , <span class="dv">4</span>)
<ul>
<li><p>Matrices are two–dimensional vectors</p></li>
<li><p>the simplest is to create the columns and then glue them together with the command <strong>cbind</strong></p></li>
</ul>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="dv">5</span>, <span class="dv">7</span>, <span class="dv">9</span>)
y &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="dv">6</span>, <span class="dv">3</span>, <span class="dv">4</span>)
z &lt;-<span class="st"> </span><span class="kw">cbind</span>(x, y)
z</code></pre></div>
<pre><code> x y
......@@ -160,12 +163,9 @@ z</code></pre></div>
<div class="slide section level2">
 
<ul>
<li>Access is now two–dimensional</li>
<li>Access is now works by specifying indices for rows and columns</li>
</ul>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="dv">5</span>, <span class="dv">7</span> , <span class="dv">9</span>)
y &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="dv">6</span>, <span class="dv">3</span> , <span class="dv">4</span>)
z &lt;-<span class="st"> </span><span class="kw">cbind</span>(x, y)
z[<span class="kw">c</span>(<span class="dv">1</span>,<span class="dv">2</span>), ]</code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">z[<span class="kw">c</span>(<span class="dv">1</span>,<span class="dv">2</span>), ]</code></pre></div>
<pre><code> x y
[1,] 5 6
[2,] 7 3</code></pre>
......@@ -178,7 +178,7 @@ z[<span class="kw">c</span>(<span class="dv">1</span>,<span class="dv">2</span>)
<div id="data-frames-tibbles-and-lists" class="slide section level2">
<h1>Data frames (tibbles) and lists</h1>
<ul>
<li><p>A data frame is a matrix where the columns can have different data types</p></li>
<li><p>A data frame is a matrix where the <strong>columns can have different data types</strong></p></li>
<li><p>Rows represent the samples and columns the variables</p></li>
<li><p>the tidyverse equivalent of a <strong>data frame</strong> is called a <strong>tibble</strong></p></li>
</ul>
......@@ -186,7 +186,7 @@ z[<span class="kw">c</span>(<span class="dv">1</span>,<span class="dv">2</span>)
<div class="slide section level2">
 
<ul>
<li>example: import a small csv table</li>
<li>import a small csv table with <strong>read_csv</strong> from the <strong>readr</strong> package</li>
</ul>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">pat &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;http://www-huber.embl.de/users/klaus/BasicR/Patients.csv&quot;</span>)</code></pre></div>
<pre><code> Parsed with column specification:
......@@ -219,7 +219,7 @@ z[<span class="kw">c</span>(<span class="dv">1</span>,<span class="dv">2</span>)
<p>There are a couple of operators useful for comparisons:</p>
<ul>
<li><code>Variable == value</code>: equal</li>
<li><code>Variable != value</code>: unequal</li>
<li><code>Variable != value</code>: unequal</li>
<li><code>Variable &lt; value</code>: less</li>
<li><code>Variable &gt; value</code>: greater</li>
<li><code>&amp;: and</code></li>
......@@ -228,6 +228,22 @@ z[<span class="kw">c</span>(<span class="dv">1</span>,<span class="dv">2</span>)
<li><code>%in%</code>: is element?</li>
</ul>
</div>
<div class="slide section level2">
<ul>
<li>You can also access data frames via indices or row / column names</li>
</ul>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">pat[<span class="dv">2</span>, <span class="kw">c</span>(<span class="st">&quot;PatientId&quot;</span>, <span class="st">&quot;Height&quot;</span>)]</code></pre></div>
<pre><code> # A tibble: 1 × 2
PatientId Height
&lt;chr&gt; &lt;dbl&gt;
1 P2 1.9</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">pat[<span class="st">&quot;P2&quot;</span>, <span class="kw">c</span>(<span class="dv">1</span>, <span class="dv">2</span>)]</code></pre></div>
<pre><code> # A tibble: 1 × 2
PatientId Height
&lt;chr&gt; &lt;dbl&gt;
1 &lt;NA&gt; NA</code></pre>
</div>
<div id="vectors-with-arbitrary-contents-lists" class="slide section level2">
<h1>Vectors with arbitrary contents: Lists</h1>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">L &lt;-<span class="st"> </span><span class="kw">list</span>(<span class="dt">one =</span> <span class="dv">1</span>, <span class="dt">two =</span> <span class="kw">c</span>(<span class="dv">1</span>, <span class="dv">2</span>), <span class="dt">five =</span> <span class="kw">seq</span>(<span class="dv">1</span>, <span class="dv">4</span>, <span class="dt">length =</span> <span class="dv">5</span>),
......@@ -249,7 +265,8 @@ L</code></pre></div>
<div class="slide section level2">
 
<ul>
<li>access via the double bracket operator</li>
<li><p>access works via the double bracket operator</p></li>
<li><p>recursively accesses the list contents</p></li>
</ul>
<p><a href="http://r4ds.had.co.nz/vectors.html#visualising-lists" class="uri">http://r4ds.had.co.nz/vectors.html#visualising-lists</a></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">names</span>(L)</code></pre></div>
......@@ -264,7 +281,7 @@ L</code></pre></div>
<div class="slide section level2">
 
<ul>
<li>data frames = special lists;</li>
<li>data frames = special lists</li>
</ul>
<p>=&gt; can be accessed in the same way</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">pat$Height</code></pre></div>
......@@ -304,7 +321,7 @@ bodyfat</code></pre></div>
<div class="slide section level2">
 
<ul>
<li><p>the function <strong>map</strong> from the <strong>purrr</strong> package applies another function to every element of a list</p></li>
<li><p>the function <strong>map</strong> from the <strong>purrr</strong> package applies another function to every element of a list or vector</p></li>
<li><p>The following code computes the mean value for every variable</p></li>
</ul>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">head</span>(<span class="kw">map_dbl</span>(bodyfat, mean))</code></pre></div>
......@@ -314,14 +331,17 @@ bodyfat</code></pre></div>
<div id="custom-functions" class="slide section level2">
<h1>Custom functions</h1>
<ul>
<li><p>The map functions are really useful for applying your custom functions</p></li>
<li><p>function template:</p></li>
<li><p>The map functions are useful for applying your custom functions</p></li>
<li><p>function template for R:</p></li>
</ul>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">function_name &lt;-<span class="st"> </span>function(argument_1, argument_2,
<span class="dt">optional_argument =</span> defautl_value )
<span class="dt">optional_argument =</span> default_value )
{
<span class="kw">return</span>(...)
}</code></pre></div>
<ul>
<li>the return statement is optional, by default the result of the last computation is returned</li>
</ul>
</div>
<div class="slide section level2">
 
......@@ -407,8 +427,8 @@ bodyfat &lt;-<span class="st"> </span><span class="kw">mutate</span>(bodyfat, <s
<h1>Simple plotting in R: “qplot” of ggplot2</h1>
<ul>
<li>The package <strong>ggplot2</strong> allows very flexible plotting in R</li>
<li>it takes a while to get acquainted with the underlying “grammer of graphics”</li>
<li>we will use its function <strong><code>qplot()</code></strong> for “quick plotting”</li>
<li>however, it takes a while to get acquainted with the underlying “grammar of graphics”</li>
<li>we will introduce its function <strong><code>qplot()</code></strong> for “quick plotting”</li>
</ul>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">qplot</span>(x, <span class="dt">y =</span> <span class="ot">NULL</span>, ..., data, <span class="dt">facets =</span> <span class="ot">NULL</span>,
<span class="ot">NA</span>), ylim =<span class="st"> </span><span class="kw">c</span>(<span class="ot">NA</span>, <span class="ot">NA</span>), log =<span class="st"> &quot;&quot;</span>, main =<span class="st"> </span><span class="ot">NULL</span>,
......@@ -431,7 +451,8 @@ bodyfat &lt;-<span class="st"> </span><span class="kw">mutate</span>(bodyfat, <s
<div id="a-qplot-examples-using-the-bodyfat-data" class="slide section level2">
<h1>A qplot examples using the bodyfat data</h1>
<ul>
<li>plot of <strong>perc.fat</strong> against abdomen circumference</li>
<li>plot of <strong>perc.fat</strong> against <strong>abdomen.circum</strong></li>
<li>we use <strong>cut</strong> to bin the weight data</li>
</ul>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bodyfat &lt;-<span class="st"> </span><span class="kw">mutate</span>(bodyfat, <span class="dt">weight_binned =</span> <span class="kw">cut</span>(weight_kg, <span class="dv">5</span>))
 
......@@ -481,7 +502,7 @@ s &lt;-<span class="st"> </span><span class="kw">numeric</span>() <span class="c
s</code></pre></div>
<pre><code> [1] 10 20 30 40 50 60 70 80</code></pre>
<p>Note however, that you should typically resort to <code>map</code> function for this purpose as this leads to more readable code:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">map_dbl</span>(h, ~.x*<span class="dv">10</span>)</code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">map_dbl</span>(h, ~.x *<span class="st"> </span><span class="dv">10</span>)</code></pre></div>
<pre><code> [1] 10 20 30 40 50 60 70 80</code></pre>
</div>
 
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment