Commit c24fe0b7 authored by Bernd Klaus's avatar Bernd Klaus
Browse files

worked on mapping/apply functions

parent 735ef363
......@@ -87,6 +87,7 @@ summary(x[7:12])
## ----subscr_2, echo = TRUE---------------------------------------------
x[c(2,4,9)]
x[-(1:6)]
head(x)
## ----sort-rank, echo = TRUE--------------------------------------------
x <- c(1.3, 3.5, 2.7, 6.3, 6.3)
......@@ -95,6 +96,12 @@ order(x)
x[order(x)]
rank(x)
## ----factors-------------------------------------------------------------
x <- factor(c("ab", "cd", "ab"), levels = c("ab", "cd", "ef"))
x
## ----object-examples, echo = TRUE--------------------------------------
a <- 9
# is a a string?
......@@ -178,12 +185,30 @@ L[2]
pat$Height
pat[[2]]
## ----apply-example, echo = TRUE----------------------------------------
# Calculate mean for each of the first two columns
sapply(X = pat[,1:2], FUN = mean, na.rm = TRUE)
# Mean height separately for each gender
tapply(X = pat$Height, FUN = mean, INDEX = pat$Ge)
## ----loadBodyfat, echo = TRUE------------------------------------------
load(url("http://www-huber.embl.de/users/klaus/BasicR/bodyfat.rda"))
bodyfat <- as_tibble(bodyfat)
bodyfat
## ----bodyfat_map, dependson = "loadBodyfat"------------------------------
head(map_dbl(bodyfat, mean))
## ----function_template, eval=FALSE---------------------------------------
## function_name <- function(arguments, options)
## {
## return(...)
## }
## ----robust_z------------------------------------------------------------
robust_z <- function(x){
(x - median(x)) / mad(x)
}
head(map_df(bodyfat, robust_z), 3)
## ----robust_z_implicit---------------------------------------------------
head(map_df(bodyfat, ~ (.x - median(.x)) / mad(.x)), 3)
## ----plot-example, echo = TRUE-----------------------------------------
#pdf(file="plot-example.pdf", width=12, height=6)
......
......@@ -321,9 +321,11 @@ the elements:
```{r subscr_2, echo = TRUE}
x[c(2,4,9)]
x[-(1:6)]
head(x)
```
Additionally, there are some useful commands to order and sort vectors
The function `head` provides a preview of the vector. There are also
useful functions to order and sort vectors:
* `sort`: sort in increasing order
* `order`: orders the indexes is such a way that the elements
......@@ -377,11 +379,21 @@ There are the following elementary types or ("modes"):
* numeric: real number
* character: chain of characters, text
* factor: String or numbers, describing certain categories
* factor: categorical data that takes a fixed set of values
* logical: TRUE, FALSE
* special values: NA (missing value), NULL ("empty object"),
Inf, -Inf (infinity), NaN (not a number)
Factors are designed to represent categorical data that can take a fixed set
of possible values. Factors are built on top of integers, and have a levels attribute:
```{r factors}
x <- factor(c("ab", "cd", "ab"), levels = c("ab", "cd", "ef"))
x
```
Data storage types includes matrices, lists, data frames (tibbles), which will be introduced
in the next section. Certain types can have different subtypes, e.g. numeric
......@@ -613,64 +625,112 @@ pat$Height
pat[[2]]
```
More on lists can be found in the respective chapter of "R for data science"
[here](http://r4ds.had.co.nz/vectors.html#lists).
## Apply, mapping and custom functions
R encourages the use of functions for programming, instead of e.g. looping through
a vector or data frame, you would call a function on your data directly. These
kinds of functions are called apply functions. Here, we will use the `map` familiy
of functions from the `r CRANpkg("purr")` package instead of the base R functions.
An apply / map call applies another to a vector or list and returns the result in
another vector/list.
Thus, each step consists of "mapping" a list value to a result.
We will introduce the `map` functions by looking at a typical data set in a tabular
format, where
the rows reprsent the samples and the columns the variables measured. The
data set `bodyfat` contains various body measures for 252 men. We turn
it into a tibble by using the function `as_tibble()`.
Let's inspect it a bit. The first thing we notice
is that tibbles prints only the first 10 rows by default. Tibbles are designed
so that you dont accidentally overwhelm your console when you print large data
frames. Additionally, we get a nice summary of the variables available in our
data set.
```{r loadBodyfat, echo = TRUE}
load(url("http://www-huber.embl.de/users/klaus/BasicR/bodyfat.rda"))
bodyfat <- as_tibble(bodyfat)
bodyfat
```
## Apply functions
As data frames are just a special kind of list, namely a list that is composed of
vectors of equal length, we can use a map function to compute the mean value for every
variable in our data set.
A very useful class of functions in R are
`apply` commands, which allows to apply a function to every row or column of
a data matrix, data frame or list:
```{r bodyfat_map, dependson = "loadBodyfat"}
head(map_dbl(bodyfat, mean))
```
\begin{center}
` apply(X, MARGIN, FUN, ...) `
\end{center}
Here we use map_dbl, to ensure that we get a double value back. There
are specialized mapping functions for many data types but you can always
use the default `map()` function as a fallback when there is no specialized
equivalent available.
* ` MARGIN:` 1 (row-wise) or 2 (column-wise)
* ` FUN:` The function to apply
The map functions are really useful for applying your custom functions, for example
we can compute a robust z--score by subtracting the median and deviding by the mean
absolute deviation for each variable.
This will bring all the variables in the data set to a common scale and make
them directly comparable. These kinds of transformations are often performed
before clustering or dimensionality reduction.
You can create your own functions very easily by adhering to the following
template:
The dots argument allows you to specify additional arguments that will be
passed to `FUN`.
```{r function_template, eval=FALSE}
function_name <- function(arguments, options)
{
return(...)
}
```
Special apply are functions include: `lapply` (lists),
`sapply` (lapply
wrapper trying to convert the result into a vector or matrix),
`tapply` and aggregate (apply according to factor groups).
As you can see, the source code of the function has to be in curly brackets
By default R returns the result of the last computation performed within the
curly brackets (often, this will be the last line of the function). However,
you can always specify the return value directly with `return()`. If
you want to return multiple values, you can return a list.
We can illustrate this again using the patients data set:
We can now easily define our function and apply it to the data set.
```{r robust_z}
robust_z <- function(x){
(x - median(x)) / mad(x)
}
```{r apply-example, echo = TRUE}
# Calculate mean for each of the first two columns
sapply(X = pat[,1:2], FUN = mean, na.rm = TRUE)
# Mean height separately for each gender
tapply(X = pat$Height, FUN = mean, INDEX = pat$Ge)
head(map_df(bodyfat, robust_z), 3)
```
Here, we used the function `map_df` to make sure that we get a data frame back.
There is an even simpler way to achieve the same goal. Using a tilde (~) to create
an R formula, the map
functions allow you to define anonymous functions with a default argument `.x`.
With this, we do not need to define our robust z--score function explicitly.
Data handling can be much more elegantly performed by the \CRANpkg{plyr}
and \CRANpkg{dplyr} packages, which will be introduced in another lab.
```{r robust_z_implicit}
head(map_df(bodyfat, ~ (.x - median(.x)) / mad(.x)), 3)
```
### Exercise: Handling a small data set
## Computing variables from existing ones and predicate functions
* Read in the data set \file{Patients.csv} from the website
__Exercise: Handling a small data set__
\url{http://www-huber.embl.de/users/klaus/BasicR/Patients.csv}
* Read in the data set `Patients.csv` from the website
[http://www-huber.embl.de/users/klaus/BasicR/Patients.csv](http://www-huber.embl.de/users/klaus/BasicR/Patients.csv)
* Check whether the read in data is actually a `data.frame`.
* Check whether the read in data is actually a `data.frame`. Make sure that it
is a tibble!
* Which variables are stored in the data frame and what are their values?
* Is there a missing weight value? If yes, replace it by the mean of the other weight values.
* Calculate the mean weight and height of all the patients.
* Calculate the $\text{BMI}= \text{Weight} / \text{Height}^2$ of all the patients.
Attach the BMI vector to the data frame using the function `cbind`.
* Calculate the BMI = Weight / Height^2 of all the patients.
* Attach the BMI vector to the data frame using the function `cbind`.
## Plotting in R
......@@ -896,26 +956,6 @@ i.e. if you do not change
them, their default values are used.
## Creating your own functions
You can create your own functions very easily by adhering to the following
template \
`function.name<-function(arguments, options) `
{
return(...)
}
* The source code of the function has
to be in curly brackets
* By default R
returns the result of the last line of the function, you can specify the return
value directly with `return()`. If you want to return
multiple values, you can return a list.
......
......@@ -423,7 +423,10 @@ x +<span class="st"> </span><span class="dv">2</span></code></pre></div>
<pre><code> [1] 8.2 5.6 9.3</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x[-(<span class="dv">1</span>:<span class="dv">6</span>)]</code></pre></div>
<pre><code> [1] 6.5 7.0 9.3 1.2 14.5 6.2</code></pre>
<p>Additionally, there are some useful commands to order and sort vectors</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">head</span>(x)</code></pre></div>
<pre><code> [1] 7.5 8.2 3.1 5.6 8.2 9.3</code></pre>
<p>The function <code>head</code> provides a preview of the vector. There are also<br />
useful functions to order and sort vectors:</p>
<ul>
<li><code>sort</code>: sort in increasing order</li>
<li><p><code>order</code>: orders the indexes is such a way that the elements of the vector are sorted, i.e <code>sort(v) = v[order(v)]</code></p></li>
......@@ -459,10 +462,15 @@ x +<span class="st"> </span><span class="dv">2</span></code></pre></div>
<ul>
<li>numeric: real number</li>
<li>character: chain of characters, text</li>
<li>factor: String or numbers, describing certain categories</li>
<li>factor: categorical data that takes a fixed set of values</li>
<li>logical: TRUE, FALSE</li>
<li>special values: NA (missing value), NULL (“empty object”), Inf, -Inf (infinity), NaN (not a number)</li>
</ul>
<p>Factors are designed to represent categorical data that can take a fixed set of possible values. Factors are built on top of integers, and have a levels attribute:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x &lt;-<span class="st"> </span><span class="kw">factor</span>(<span class="kw">c</span>(<span class="st">&quot;ab&quot;</span>, <span class="st">&quot;cd&quot;</span>, <span class="st">&quot;ab&quot;</span>), <span class="dt">levels =</span> <span class="kw">c</span>(<span class="st">&quot;ab&quot;</span>, <span class="st">&quot;cd&quot;</span>, <span class="st">&quot;ef&quot;</span>))
x</code></pre></div>
<pre><code> [1] ab cd ab
Levels: ab cd ef</code></pre>
<p>Data storage types includes matrices, lists, data frames (tibbles), which will be introduced in the next section. Certain types can have different subtypes, e.g. numeric can be further subdivided into the integer, single and double types. Types can be checked by the <code>is.*</code> and changed (“casted”) by the <code>as.*</code> functions. Furthermore, the function <code>str</code> is very useful in order to obtain an overview of an (possibly complex) object at hand. The following examples will make this clear. We first assign the value <code>9</code> to an object and then perform various operations on it.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">a &lt;-<span class="st"> </span><span class="dv">9</span>
<span class="co"># is a a string?</span>
......@@ -663,52 +671,97 @@ L</code></pre></div>
<pre><code> [1] 1.65 1.30 1.20</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">pat[[<span class="dv">2</span>]]</code></pre></div>
<pre><code> [1] 1.65 1.30 1.20</code></pre>
<p>More on lists can be found in the respective chapter of “R for data science” <a href="http://r4ds.had.co.nz/vectors.html#lists">here</a>.</p>
</div>
</div>
<div id="apply-functions" class="section level2">
<h2><span class="header-section-number">6.3</span> Apply functions</h2>
<p>A very useful class of functions in R are<br />
<code>apply</code> commands, which allows to apply a function to every row or column of a data matrix, data frame or list:</p>
<div id="apply-mapping-and-custom-functions" class="section level2">
<h2><span class="header-section-number">6.3</span> Apply, mapping and custom functions</h2>
<p>R encourages the use of functions for programming, instead of e.g. looping through a vector or data frame, you would call a function on your data directly. These kinds of functions are called apply functions. Here, we will use the <code>map</code> familiy of functions from the <em><a href="http://cran.fhcrc.org/web/packages/purr/index.html">purr</a></em> package instead of the base R functions. An apply / map call applies another to a vector or list and returns the result in another vector/list.</p>
<p>Thus, each step consists of “mapping” a list value to a result. We will introduce the <code>map</code> functions by looking at a typical data set in a tabular format, where the rows reprsent the samples and the columns the variables measured. The data set <code>bodyfat</code> contains various body measures for 252 men. We turn it into a tibble by using the function <code>as_tibble()</code>.</p>
<p>Let’s inspect it a bit. The first thing we notice is that tibbles prints only the first 10 rows by default. Tibbles are designed so that you don’t accidentally overwhelm your console when you print large data frames. Additionally, we get a nice summary of the variables available in our data set.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">load</span>(<span class="kw">url</span>(<span class="st">&quot;http://www-huber.embl.de/users/klaus/BasicR/bodyfat.rda&quot;</span>))
bodyfat &lt;-<span class="st"> </span><span class="kw">as_tibble</span>(bodyfat)
bodyfat</code></pre></div>
<pre><code> # A tibble: 252 × 15
density percent.fat age weight height neck.circum chest.circum
&lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 1.07 12.3 23 154 67.8 36.2 93.1
2 1.09 6.1 22 173 72.2 38.5 93.6
3 1.04 25.3 22 154 66.2 34.0 95.8
4 1.08 10.4 26 185 72.2 37.4 101.8
5 1.03 28.7 24 184 71.2 34.4 97.3
6 1.05 20.9 24 210 74.8 39.0 104.5
7 1.05 19.2 26 181 69.8 36.4 105.1
8 1.07 12.4 25 176 72.5 37.8 99.6
9 1.09 4.1 25 191 74.0 38.1 100.9
10 1.07 11.7 23 198 73.5 42.1 99.6
# ... with 242 more rows, and 8 more variables: abdomen.circum &lt;dbl&gt;,
# hip.circum &lt;dbl&gt;, thigh.circum &lt;dbl&gt;, knee.circum &lt;dbl&gt;,
# ankle.circum &lt;dbl&gt;, bicep.circum &lt;dbl&gt;, forearm.circum &lt;dbl&gt;,
# wrist.circum &lt;dbl&gt;</code></pre>
<p>As data frames are just a special kind of list, namely a list that is composed of vectors of equal length, we can use a map function to compute the mean value for every variable in our data set.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">head</span>(<span class="kw">map_dbl</span>(bodyfat, mean))</code></pre></div>
<pre><code> density percent.fat age weight height neck.circum
1.06 19.15 44.88 178.92 70.15 37.99</code></pre>
<p>Here we use map_dbl, to ensure that we get a double value back. There are specialized mapping functions for many data types but you can always use the default <code>map()</code> function as a fallback when there is no specialized equivalent available.</p>
<p>The map functions are really useful for applying your custom functions, for example we can compute a robust z–score by subtracting the median and deviding by the mean absolute deviation for each variable.</p>
<p>This will bring all the variables in the data set to a common scale and make them directly comparable. These kinds of transformations are often performed before clustering or dimensionality reduction.</p>
<p>You can create your own functions very easily by adhering to the following template:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">function_name &lt;-<span class="st"> </span>function(arguments, options)
{
<span class="kw">return</span>(...)
}</code></pre></div>
<p>As you can see, the source code of the function has to be in curly brackets By default R returns the result of the last computation performed within the curly brackets (often, this will be the last line of the function). However, you can always specify the return value directly with <code>return()</code>. If you want to return multiple values, you can return a list.</p>
<p>We can now easily define our function and apply it to the data set.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">robust_z &lt;-<span class="st"> </span>function(x){
(x -<span class="st"> </span><span class="kw">median</span>(x)) /<span class="st"> </span><span class="kw">mad</span>(x)
}
 
<span class="kw">head</span>(<span class="kw">map_df</span>(bodyfat, robust_z), <span class="dv">3</span>)</code></pre></div>
<pre><code> # A tibble: 3 × 15
density percent.fat age weight height neck.circum chest.circum
&lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 0.763 -0.745 -1.69 -0.775 -0.759 -0.759 -0.782
2 1.459 -1.414 -1.77 -0.113 0.759 0.211 -0.722
3 -0.648 0.658 -1.77 -0.783 -1.265 -1.686 -0.460
# ... with 8 more variables: abdomen.circum &lt;dbl&gt;, hip.circum &lt;dbl&gt;,
# thigh.circum &lt;dbl&gt;, knee.circum &lt;dbl&gt;, ankle.circum &lt;dbl&gt;,
# bicep.circum &lt;dbl&gt;, forearm.circum &lt;dbl&gt;, wrist.circum &lt;dbl&gt;</code></pre>
<p>Here, we used the function <code>map_df</code> to make sure that we get a data frame back. There is an even simpler way to achieve the same goal. Using a tilde (~) to create an R formula, the map functions allow you to define anonymous functions with a default argument <code>.x</code>.</p>
<p>With this, we do not need to define our robust z–score function explicitly.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">head</span>(<span class="kw">map_df</span>(bodyfat, ~<span class="st"> </span>(.x -<span class="st"> </span><span class="kw">median</span>(.x)) /<span class="st"> </span><span class="kw">mad</span>(.x)), <span class="dv">3</span>)</code></pre></div>
<pre><code> # A tibble: 3 × 15
density percent.fat age weight height neck.circum chest.circum
&lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 0.763 -0.745 -1.69 -0.775 -0.759 -0.759 -0.782
2 1.459 -1.414 -1.77 -0.113 0.759 0.211 -0.722
3 -0.648 0.658 -1.77 -0.783 -1.265 -1.686 -0.460
# ... with 8 more variables: abdomen.circum &lt;dbl&gt;, hip.circum &lt;dbl&gt;,
# thigh.circum &lt;dbl&gt;, knee.circum &lt;dbl&gt;, ankle.circum &lt;dbl&gt;,
# bicep.circum &lt;dbl&gt;, forearm.circum &lt;dbl&gt;, wrist.circum &lt;dbl&gt;</code></pre>
</div>
<div id="computing-variables-from-existing-ones-and-predicate-functions" class="section level2">
<h2><span class="header-section-number">6.4</span> Computing variables from existing ones and predicate functions</h2>
<p><strong>Exercise: Handling a small data set</strong></p>
<ul>
<li><code>MARGIN:</code> 1 (row-wise) or 2 (column-wise)</li>
<li><code>FUN:</code> The function to apply</li>
</ul>
<p>The dots argument allows you to specify additional arguments that will be passed to <code>FUN</code>.</p>
<p>Special apply are functions include: <code>lapply</code> (lists), <code>sapply</code> (lapply wrapper trying to convert the result into a vector or matrix), <code>tapply</code> and aggregate (apply according to factor groups).</p>
<p>We can illustrate this again using the patients data set:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Calculate mean for each of the first two columns</span>
<span class="kw">sapply</span>(<span class="dt">X =</span> pat[,<span class="dv">1</span>:<span class="dv">2</span>], <span class="dt">FUN =</span> mean, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)</code></pre></div>
<pre><code> Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
returning NA</code></pre>
<pre><code> PatientId Height
NA 1.38</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Mean height separately for each gender</span>
<span class="kw">tapply</span>(<span class="dt">X =</span> pat$Height, <span class="dt">FUN =</span> mean, <span class="dt">INDEX =</span> pat$Ge) </code></pre></div>
<pre><code> f m
1.42 1.30</code></pre>
<p>Data handling can be much more elegantly performed by the and packages, which will be introduced in another lab.</p>
<div id="exercise-handling-a-small-data-set" class="section level3">
<h3><span class="header-section-number">6.3.1</span> Exercise: Handling a small data set</h3>
<ul>
<li>Read in the data set from the website</li>
<li>Read in the data set <code>Patients.csv</code> from the website</li>
</ul>
<p></p>
<p><a href="http://www-huber.embl.de/users/klaus/BasicR/Patients.csv" class="uri">http://www-huber.embl.de/users/klaus/BasicR/Patients.csv</a></p>
<ul>
<li>Check whether the read in data is actually a <code>data.frame</code>.</li>
<li>Check whether the read in data is actually a <code>data.frame</code>. Make sure that it is a tibble!</li>
<li>Which variables are stored in the data frame and what are their values?</li>
<li>Is there a missing weight value? If yes, replace it by the mean of the other weight values.</li>
<li>Calculate the mean weight and height of all the patients.<br />
</li>
<li>Calculate the <span class="math inline">\(\text{BMI}= \text{Weight} / \text{Height}^2\)</span> of all the patients. Attach the BMI vector to the data frame using the function <code>cbind</code>.</li>
<li>Calculate the BMI = Weight / Height^2 of all the patients.</li>
<li>Attach the BMI vector to the data frame using the function <code>cbind</code>.</li>
</ul>
</div>
</div>
<div id="plotting-in-r" class="section level2">
<h2><span class="header-section-number">6.4</span> Plotting in R</h2>
<h2><span class="header-section-number">6.5</span> Plotting in R</h2>
</div>
<div id="plotting-in-base-r" class="section level2">
<h2><span class="header-section-number">6.5</span> Plotting in base R</h2>
<h2><span class="header-section-number">6.6</span> Plotting in base R</h2>
<p>The default command for plotting is <code>plot()</code>, there are other specialized commands like <code>hist()</code> or <code>pie()</code>. A collection of such specialized commands (e.g. heatmaps and CI plots) can be found in the package . Another useful visualization package is , which includes a heat–scatterplot. The general <code>plot</code> command looks like this:</p>
 
<ul>
......@@ -732,7 +785,7 @@ x &lt;-<span class="st"> </span><span class="kw">seq</span>(-<span class="dv">3<
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co">#dev.off()</span></code></pre></div>
</div>
<div id="and" class="section level2">
<h2><span class="header-section-number">6.6</span> and </h2>
<h2><span class="header-section-number">6.7</span> and </h2>
<p>There’s a quick plotting function in called <code>qplot()</code> which is meant to be similar to the <code>plot()</code> function from base graphics. You can do a lot with <code>qplot()</code>, including splitting plots by factors, but in order to understand how { works, it is better to approach it from from the layering syntax.</p>
<p>All plots begin with the function <code>ggplot()</code>. <code>ggplot()</code> takes two primary arguments, <code>data</code> is the data frame containing the data to be plotted and <code>aes( )</code> are the aesthetic mappings to pass on to the plot elements.</p>
<p>As you can see, the second argument, <code>aes()</code>, isn’t a normal argument, but another function. Since we’ll never use <code>aes()</code> as a separate function, it might be best to think of it as a special way to pass a list of arguments to the plot.</p>
......@@ -776,7 +829,7 @@ ggsmooth </code></pre></div>
<li><code>xlab, ylab, xlim, ylim</code> set the x–/y–axis parameters</li>
</ul>
<div id="exercise-plotting-the-normal-density" class="section level3">
<h3><span class="header-section-number">6.6.1</span> Exercise: Plotting the normal density</h3>
<h3><span class="header-section-number">6.7.1</span> Exercise: Plotting the normal density</h3>
<p>The density of the normal distribution with expected value <span class="math inline">\(\mu\)</span> and variance <span class="math inline">\(\sigma^2\)</span> is given by: <span class="math display">\[
f(x)
= \frac{1}{\sigma^2 \sqrt{\pi}} \exp \left(- \frac{1}{2} (\frac{x- \mu}{\sigma})^2 \right)
......@@ -790,10 +843,10 @@ f(x)
</div>
</div>
<div id="calling-functions-and-programming" class="section level2">
<h2><span class="header-section-number">6.7</span> Calling functions and programming</h2>
<h2><span class="header-section-number">6.8</span> Calling functions and programming</h2>
</div>
<div id="calling-functions" class="section level2">
<h2><span class="header-section-number">6.8</span> Calling functions</h2>
<h2><span class="header-section-number">6.9</span> Calling functions</h2>
Every –function is following the pattern below:
 
<ul>
......@@ -810,16 +863,6 @@ Every –function is following the pattern below:
<li><code>{na.rm = FALSE}</code>: Remove missing values?</li>
</ul>
<p>Here, <code>x</code> (usually a vector) has to be given in order to run the function, while the other arguments such as <code>trim</code> are optional, i.e. if you do not change them, their default values are used.</p>
</div>
<div id="creating-your-own-functions" class="section level2">
<h2><span class="header-section-number">6.9</span> Creating your own functions</h2>
<p>You can create your own functions very easily by adhering to the following template<br />
</p>
<p><code>function.name&lt;-function(arguments, options)</code> { return(…) }</p>
<ul>
<li>The source code of the function has to be in curly brackets</li>
<li>By default R returns the result of the last line of the function, you can specify the return value directly with <code>return()</code>. If you want to return multiple values, you can return a list.</li>
</ul>
<p>As example, we look at the following currency converter function</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">euro.calc&lt;-function(x, <span class="dt">currency=</span><span class="st">&quot;US&quot;</span>) {
## currency has a default argrument &quot;US&quot;
......@@ -997,7 +1040,7 @@ sc.B &lt;-<span class="st"> </span><span class="kw">as.matrix</span>(<span class
 
fr.A *<span class="st"> </span>sc.B</code></pre></div>
</div>
<div id="exercise-handling-a-small-data-set-1" class="section level3">
<div id="exercise-handling-a-small-data-set" class="section level3">
<h3><span class="header-section-number">6.11.9</span> Exercise: Handling a small data set</h3>
<ul>
<li>Read in the data set from the website</li>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment