Commit 4ab56964 by Bernd Klaus

### small edits and extensions

parent 27926c0b
 ... ... @@ -65,7 +65,7 @@ Bernd Klaus
1. Starting point: Raw data in various formats
2. Import, reshape the data into required formats
3. Perfom computations on your data, plot it
4. Perform computations on your data, plot it
• We will use of the tidyverse: a set of packages that make using R easier

• ... ... @@ -124,30 +124,33 @@ z <- x + y z
[1] 10
• The most basic elementary data structure in R are vectors:
• The most basic elementary data structure in R are vectors

• can be created by concatenating individual elements via the c function

x <- c(7.5, 8.2, 3.1, 5.6, 8.2)
• can be created by concatenating individual elements via the c function
• Subsets are created by the bracket operator (1-based counting!)
• Subsets are created by the bracket operator (1-based counting!)

• The indices of the elements you want to access are given inside the square brackets

• head gives the first 6 elements of a data structure

[1] 7.5 8.2 3.1 5.6 8.2
x[c(1, 2, 4)]
[1] 7.5 8.2 5.6
x[-(1:3)]
[1] 5.6 8.2
[1] 7.5 8.2 3.1 5.6 8.2

Matrices in R

Matrices are two–dimensional vectors, the simplest is to create the columns and then glue them together with the command cbind

x <- c(5, 7 , 9)
y <- c(6, 3 , 4)

• Matrices are two–dimensional vectors

• the simplest is to create the columns and then glue them together with the command cbind

x <- c(5, 7, 9)
y <- c(6, 3, 4)
z <- cbind(x, y)
z
x y
...  ...  @@ -160,12 +163,9 @@ z
• Access is now two–dimensional
• Access is now works by specifying indices for rows and columns
x <- c(5, 7 , 9)
y <- c(6, 3 , 4)
z <- cbind(x, y)
z[c(1,2), ]
z[c(1,2), ]
x y
[1,] 5 6
[2,] 7 3
... ... @@ -178,7 +178,7 @@ z[c(1,2)

Data frames (tibbles) and lists

• A data frame is a matrix where the columns can have different data types

• A data frame is a matrix where the columns can have different data types

• Rows represent the samples and columns the variables

• the tidyverse equivalent of a data frame is called a tibble

... ... @@ -186,7 +186,7 @@ z[c(1,2)
• example: import a small csv table
• import a small csv table with read_csv from the readr package
Parsed with column specification:
...  ...  @@ -219,7 +219,7 @@ z[c(1,2)

There are a couple of operators useful for comparisons:

• Variable == value: equal
• Variable != value: un–equal
• Variable != value: unequal
• Variable < value: less
• Variable > value: greater
• &: and
• ... ... @@ -228,6 +228,22 @@ z[c(1,2)
• %in%: is element?
• You can also access data frames via indices or row / column names
pat[2, c("PatientId", "Height")]
# A tibble: 1 × 2
PatientId Height
<chr>  <dbl>
1        P2    1.9
pat["P2", c(1, 2)]
# A tibble: 1 × 2
PatientId Height
<chr>  <dbl>
1      <NA>     NA

Vectors with arbitrary contents: Lists

L <- list(one = 1, two = c(1, 2), five = seq(1, 4, length = 5),
...  ...  @@ -249,7 +265,8 @@ L
• access via the double bracket operator
• access works via the double bracket operator

• recursively accesses the list contents

names(L)
... ... @@ -264,7 +281,7 @@ L
• data frames = special lists;
• data frames = special lists

=> can be accessed in the same way

pat\$Height
... ... @@ -304,7 +321,7 @@ bodyfat
• the function map from the purrr package applies another function to every element of a list

• the function map from the purrr package applies another function to every element of a list or vector

• The following code computes the mean value for every variable

... ... @@ -314,14 +331,17 @@ bodyfat

Custom functions

• The map functions are really useful for applying your custom functions

• function template:

• The map functions are useful for applying your custom functions

• function template for R:

function_name <- function(argument_1, argument_2,
optional_argument = defautl_value )
optional_argument = default_value )
{
return(...)
}
• the return statement is optional, by default the result of the last computation is returned
... ... @@ -407,8 +427,8 @@ bodyfat <- mutate(bodyfat, Simple plotting in R: “qplot” of ggplot2
• The package ggplot2 allows very flexible plotting in R
• it takes a while to get acquainted with the underlying “grammer of graphics”
• we will use its function qplot() for “quick plotting”
• however, it takes a while to get acquainted with the underlying “grammar of graphics”
• we will introduce its function qplot() for “quick plotting”
qplot(x, y = NULL, ..., data, facets = NULL,
NA), ylim = c(NA, NA), log = "", main = NULL,
...  ...  @@ -431,7 +451,8 @@ bodyfat <- mutate(bodyfat,

A qplot examples using the bodyfat data

• plot of perc.fat against abdomen circumference
• plot of perc.fat against abdomen.circum
• we use cut to bin the weight data
bodyfat <- mutate(bodyfat, weight_binned = cut(weight_kg, 5))

...  ...  @@ -481,7 +502,7 @@ s <- numeric()
map_dbl(h, ~.x*10)
map_dbl(h, ~.x * 10)
[1] 10 20 30 40 50 60 70 80
... ...
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment