Commit 27926c0b authored by Bernd Klaus's avatar Bernd Klaus
Browse files

small edits and first version of slides

parent 4dbb5ad9
......@@ -5,3 +5,6 @@ Predoc_Course_2016_files/
omics_practicals.pdf
R-lab_files/
R-lab_cache/
Slides_tidyverse_R_intro/Slides_tidyverse_R_intro_cache/
Slides_tidyverse_R_intro/Slides_tidyverse_R_intro_files/
......@@ -158,7 +158,7 @@ pat_xls
str(pat_xls)
## ----subset_data---------------------------------------------------------
pat_tiny <- filter(pat, Height < 1.5)
pat_tiny <- filter(pat, Height < 1.7)
select(pat_tiny, PatientId, Height, Gender)
## ----light_and_small_patients, echo = TRUE-----------------------------
......@@ -184,6 +184,7 @@ L[2]
## ----list-example_df, echo = TRUE--------------------------------------
pat$Height
pat[[2]]
pat[["Gender"]]
## ----loadBodyfat, echo = TRUE------------------------------------------
load(url("http://www-huber.embl.de/users/klaus/BasicR/bodyfat.rda"))
......@@ -218,6 +219,8 @@ inch_to_m <- 0.0254
bodyfat <- mutate(bodyfat, height_m = height * inch_to_m,
weight_kg = weight * pb_to_kg)
select(bodyfat, height, height_m, weight, weight_kg)
## ----predicate_functions-------------------------------------------------
keep(pat, is_double)
map_dbl(discard(pat, is_character), mean, na.rm = TRUE)
......@@ -256,11 +259,11 @@ qplot(x, y, data = hex_grid, color = lab, asp = 1) +
## ----if_example, echo = TRUE, eval = TRUE------------------------------
w = 3
w <- 3
if (w < 5) {
d = 2
d <- 2
} else {
d = 10
d <- 10
}
d
......
......@@ -547,7 +547,7 @@ command, we get all the patients that are less tall than 1.5 and select their He
Gender as well as their Id:
```{r subset_data}
pat_tiny <- filter(pat, Height < 1.5)
pat_tiny <- filter(pat, Height < 1.7)
select(pat_tiny, PatientId, Height, Gender)
```
......@@ -619,18 +619,19 @@ they can actually be accessed in the same way.
```{r list-example_df, echo = TRUE}
pat$Height
pat[[2]]
pat[["Gender"]]
```
More on lists can be found in the respective chapter of "R for data science"
[here](http://r4ds.had.co.nz/vectors.html#lists).
## Applying a function to elemnts of a data structure
## Applying a function to elements of a data structure
R encourages the use of functions for programming. Instead of e.g. looping through
a vector or data frame, you can use specialized functions that apply another function
to each element of your data. These kinds of functions are called apply functions.
Here, we will use the `map` family
of functions from the `r CRANpkg("purr")` package instead of the base R functions.
of functions from the `r CRANpkg("purrr")` package instead of the base R functions.
An apply / map call applies a function to a vector or list and returns the result in
another vector/list. Thus, each step consists of "mapping" a list value to a result.
......@@ -718,7 +719,7 @@ head(map_df(bodyfat, ~ (.x - median(.x)) / mad(.x)), 3)
Often, we want to use variables stored in our data set to compute derived quanities.
For example, we might be interest in the weight in kilograms instead of pounds
and the hight in meters instead of inches. The function `transform` allows us to
and the hight in meters instead of inches. The function `mutate` allows us to
do this.
```{r transform_example}
......@@ -727,6 +728,8 @@ inch_to_m <- 0.0254
bodyfat <- mutate(bodyfat, height_m = height * inch_to_m,
weight_kg = weight * pb_to_kg)
select(bodyfat, height, height_m, weight, weight_kg)
```
We often want to apply our function only to variables in the data set that are
......@@ -756,11 +759,11 @@ __Exercise: Handling a small data set__
* Calculate the `BMI = Weight / Height^2` of all the patients.
# Simple plotting in R: qlot and `r CRANpkg("ggplot2")`
# Simple plotting in R: qplot of `r CRANpkg("ggplot2")`
The package `r CRANpkg("ggplot2")` allows very flexible plotting in R,
but takes a while to get acquainted with the underlying grammer
of graphics. Thus, will use its function `qplot()` for "quick plotting",
of graphics. Thus, we will use its function `qplot()` for "quick plotting",
which requires no knowledge of the underlying advanced features and behaves
much like R's default `plot` function.
However, it offers advanced options like facetting or coloring by condition
......@@ -851,11 +854,11 @@ when the condition is not met):
```{r if_example, echo = TRUE, eval = TRUE}
w = 3
w <- 3
if (w < 5) {
d = 2
d <- 2
} else {
d = 10
d <- 10
}
d
```
......
This diff is collapsed.
---
title: "R introduction using the tidyverse"
author: "Bernd Klaus"
date: "November 29, 2016"
output:
slidy_presentation:
df_print: tibble
highlight: kate
font_adjustment: +1
fig_width: 6
fig_height: 3.71
---
```{r setup, include=FALSE}
library(knitr)
library(tidyverse)
options(digits = 3, width = 80)
opts_chunk$set(echo = TRUE, tidy = FALSE, include = TRUE,
dev = 'png', fig.width = 6, fig.asp = 0.618,
comment = ' ', dpi = 300,
cache = TRUE)
```
## What are we going to learn? {.bigger}
* The course designed from a data analysis point of view:
1. Starting point: Raw data in various formats
2. Import, reshape the data into required formats
3. Perfom computations on your data, plot it
* We will use of the [tidyverse](http://tidyverse.org/): a set of packages that make using R easier
* We draw a lot of inspiration from "R for data science"
<http://r4ds.had.co.nz/>
## Our first tool: R Markdown {.smaller}
* Markdown is a simple formatting syntax for authoring HTML, PDF, and
MS Word documents: <http://rmarkdown.rstudio.com>
### Text formatting
```
*italic* or _italic_
**bold** __bold__
```
### Headings
```
# 1st Level Header
## 2nd Level Header
```
### Lists
```
* Bulleted list item 1
* Item 2
1. Numbered list item 1
1. Item 2. The numbers are incremented automatically in the output.
```
----------
### Links and images
```
<http://example.com>
[linked phrase](http://example.com)
![optional caption text](path/to/img.png)
```
### Tables
```
First Header | Second Header
------------- | -------------
Content Cell | Content Cell
Content Cell | Content Cell
```
----------
* __R Markdown is a mix of R code and markdown comments__
* In R Studio, R markdown can be run in a __notebook mode__
* ... better than simple text based "scripts"
## Simple arithmetics and vectors
```{r simple_arithmetics}
x <- 6
y <- 4
z <- x + y
z
```
* The most basic elementary data structure in R are vectors:
```{r vectors}
x <- c(7.5, 8.2, 3.1, 5.6, 8.2)
```
* can be created by concatenating individual elements via the **c** function
--------
* Subsets are created by the bracket operator (1-based counting!)
```{r vector access}
x[c(1, 2, 4)]
x[-(1:3)]
head(x)
```
## Matrices in R
Matrices are two--dimensional vectors, the simplest is to create the columns
and then glue them together with the command __cbind__
```{r cbind-ex, echo = TRUE}
x <- c(5, 7 , 9)
y <- c(6, 3 , 4)
z <- cbind(x, y)
z
dim(z)
```
-----
* Access is now two--dimensional
```{r cbind-ex_acces, echo = TRUE}
x <- c(5, 7 , 9)
y <- c(6, 3 , 4)
z <- cbind(x, y)
z[c(1,2), ]
z[, -1]
z[2, ]
```
## Data frames (tibbles) and lists
* A data frame is a matrix where the columns can have different data types
* Rows represent the samples and columns the variables
* the tidyverse equivalent of a **data frame** is called a **tibble**
---
* example: import a small csv table
```{r load-Patients, echo = TRUE}
pat <- read_csv("http://www-huber.embl.de/users/klaus/BasicR/Patients.csv")
pat
```
## Accessing data in data frames
* **filter** (for rows) and **select** (for columns / variables) from the tidyverse package **dplyr**
```{r subset_data}
pat_tiny <- filter(pat, Height < 1.7)
select(pat_tiny, PatientId, Height, Gender)
```
There are a couple of operators useful for comparisons:
* `Variable == value`: equal
* `Variable != value`: un--equal
* `Variable < value`: less
* `Variable > value`: greater
* `&: and`
* `|` or
* `!`: negation
* `%in%`: is element?
## Vectors with arbitrary contents: Lists
```{r list_example, echo = TRUE}
L <- list(one = 1, two = c(1, 2), five = seq(1, 4, length = 5),
list(string = "Hello World"))
L
```
------------------
* access via the double bracket operator
<http://r4ds.had.co.nz/vectors.html#visualising-lists>
```{r list_access, echo = TRUE}
names(L)
L$five + 10
L[[3]] + 10
L[["two"]]
```
---
* data frames = special lists;
=> can be accessed in the same way
```{r list-example_df, echo = TRUE}
pat$Height
pat[[2]]
pat[["Gender"]]
```
## Applying a function to elements of a data structure
* instead of looping through a data structure, apply a function to each element
* the **bodyfat** data set contains various body measures for 252 men
```{r loadBodyfat, echo = TRUE}
load(url("http://www-huber.embl.de/users/klaus/BasicR/bodyfat.rda"))
bodyfat <- as_tibble(bodyfat)
bodyfat
```
-----
* the function **map** from the **purrr** package applies another function to every element of a list
* The following code computes the mean value for every variable
```{r bodyfat_map, dependson = "loadBodyfat"}
head(map_dbl(bodyfat, mean))
```
## Custom functions
* The map functions are really useful for applying your custom functions
* function template:
```{r function_template, eval=FALSE}
function_name <- function(argument_1, argument_2,
optional_argument = defautl_value )
{
return(...)
}
```
---
* let's compute a robust z--score for every variable
```{r robust_z}
robust_z <- function(x){
(x - median(x)) / mad(x)
}
map_df(bodyfat, robust_z)
```
---
* you can define a function implicitly via the **.x** and **.y** arguments
```{r robust_z_implicit}
map_df(bodyfat, ~ (.x - median(.x)) / mad(.x))
```
## Transforming variables
* The function **mutate** takes the input data and computes derived quantities
* Here, change the units from the imperial system to the metric one
```{r transform_example}
pb_to_kg <- 1/2.2046
inch_to_m <- 0.0254
bodyfat <- mutate(bodyfat, height_m = height * inch_to_m,
weight_kg = weight * pb_to_kg)
select(bodyfat, height, height_m, weight, weight_kg)
```
## Simple plotting in R: "qplot" of ggplot2
* The package **ggplot2** allows very flexible plotting in R
* it takes a while to get acquainted with the underlying "grammer of graphics"
* we will use its function **`qplot()`** for "quick plotting"
```{r qplot, eval=FALSE}
qplot(x, y = NULL, ..., data, facets = NULL,
NA), ylim = c(NA, NA), log = "", main = NULL,
xlab = , ylab = )
```
----
The arguments are:
* `x:` x--axis data
* `y:` y--axis data (may be missing)
* `data:` `data.frame` containing the variables used in the plot
* `facets ` split the plot into facets, use a formula like
. ~split to do wrapped splitting and row ~ columns to split by rows and columns
* `main:` plot heading
* `color, fill` set to factor/string in the data set in order to
color the plot depending on that factor. Use `I("colorname")` to use a
specific color.
* `geom` specify a "geometry" to be used in the plots, examples
include point, line, boxplot, histogram etc.
* `xlab, ylab, xlim, ylim`: set the x--/y--axis parameters
## A qplot examples using the bodyfat data
* plot of **perc.fat** against abdomen circumference
```{r qplot_example, fig.show='hide'}
bodyfat <- mutate(bodyfat, weight_binned = cut(weight_kg, 5))
qplot(abdomen.circum, percent.fat,
color = weight_binned, data = bodyfat)
```
* abdomen circumference, weight and bodyfat are highly correlated to each other
<img src="Slides_tidyverse_R_intro_files/figure-slidy/qplot_example-1.png" alt="Mountain View" height="75%" width="75%">
----
* The same data plotted using facets
```{r qplot_example_facets, fig.show='hide'}
qplot(abdomen.circum, percent.fat,
color = weight_binned, data = bodyfat,
facets = ~weight_binned)
```
<img src="Slides_tidyverse_R_intro_files/figure-slidy/qplot_example_facets-1.png" alt="Mountain View" height="75%" width="75%">
## Programming statements
R offers the typical options for flow--control known from many other
languages.
* the **if--statement** checks whether a a certain condition is met
```{r if_example, echo = TRUE, eval = TRUE}
w <- 3
if (w < 5) {
d <- 2
} else {
d <- 10
}
d
```
----
* iterations are performed via __for--loops__
```{r for_example, echo = TRUE, eval = TRUE}
h <- seq(from = 1, to = 8)
s <- numeric() # create empty vector
for (i in 1:8)
{
s[i] <- h[i] * 10
}
s
```
Note however, that you should typically resort to `map` function for this
purpose as this leads to more readable code:
```{r maps_for}
map_dbl(h, ~.x*10)
```
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment