README.Rmd 4.24 KB
Newer Older
Vallo Varik's avatar
Vallo Varik committed
1
2
3
4
5
6
7
8
9
10
11
---
title: "32 drugs"
output: 
    md_document:
      preserve_yaml: FALSE
      fig_width: 7
      fig_height: 5
      toc: yes
      toc_depth: 2
---

Vallo Varik's avatar
Vallo Varik committed
12
# Motivation
Vallo Varik's avatar
Vallo Varik committed
13

Vallo Varik's avatar
Vallo Varik committed
14
Getting up to speed with R using dose-response for 32 drugs against 6
Vallo Varik's avatar
Vallo Varik committed
15
bacterial strains.
Vallo Varik's avatar
Vallo Varik committed
16
17


Vallo Varik's avatar
Vallo Varik committed
18
19
20
21
22
23
24
25
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=FALSE, warning=FALSE, message=FALSE, dpi=300, 
  fig.path = "output/fig/")
knitr::opts_knit$set(global.par = TRUE)
library('tidyverse')
```


Vallo Varik's avatar
Vallo Varik committed
26
27
# Tasks

Vallo Varik's avatar
Vallo Varik committed
28
In the following, we go through the most common steps in data analysis:
Vallo Varik's avatar
Vallo Varik committed
29
30
31
32
exploration, transformation (i.e. deriving new variables) and modeling (using
statistical tools to answer questions: what happens on average, which
conditions are different and so on). Integral to all steps is visualization
i.e. making graphs. 
Vallo Varik's avatar
Vallo Varik committed
33
34
35

## Explore

Vallo Varik's avatar
Vallo Varik committed
36
As a first look, the exploratory plots are informative and serve as a quality
Vallo Varik's avatar
Vallo Varik committed
37
control i.e. you check that there is nothing extra suspicious going on. Raw OD will suffice for that.
Vallo Varik's avatar
Vallo Varik committed
38

Vallo Varik's avatar
Vallo Varik committed
39
40
41
1. Plot growth curves following raw OD in time. Input
   [data](doc/tasks/01_dat.csv) and expected [output](doc/tasks/01_out.pdf)
   plot are provided. The data is for azithromycin against _S. flexneri_ M90T
Vallo Varik's avatar
Vallo Varik committed
42
   from day 2022-05-04 (first replicate). _A tip: Use `facet_wrap` with `ncol = 1` argument to have different concentrations on separate plots._
Vallo Varik's avatar
Vallo Varik committed
43

Vallo Varik's avatar
Vallo Varik committed
44
45
2. Try again, now with [data](doc/tasks/02_dat.csv) from two days (let us plot
   days in different color). In addition, transform the y-axis to logarithmic
Vallo Varik's avatar
Vallo Varik committed
46
47
48
   scale. Expected [output](doc/tasks/02_out.pdf). _A tip: you need to turn
   the `Date` variable into a factor._

Vallo Varik's avatar
Vallo Varik committed
49
50
3. Once more, now with [data](doc/tasks/03_dat.csv) from three days. Expected
   [output](doc/tasks/03_out.pdf). You will encounter an issue because there
Vallo Varik's avatar
Vallo Varik committed
51
   were two biological replicates on third day. There are multiple ways to
Vallo Varik's avatar
Vallo Varik committed
52
53
   overcome this, but for now, I recommend to solve by using `group` parameter 
   of `aes` e.g. `ggplot(aes(..., group = Plt))`.
Vallo Varik's avatar
Vallo Varik committed
54
55
56

## Transform

Vallo Varik's avatar
Vallo Varik committed
57
To quantify the growth (either rate or yield) one needs to subtract the
Vallo Varik's avatar
Vallo Varik committed
58
background from raw OD. There are two ways to do that: 1) using a readout from
Vallo Varik's avatar
Vallo Varik committed
59
60
61
just the medium; 2) using the smallest value per well (i.e. OD in one of the
first timepoints of a particular well). I prefer to use the former whenever
possible.
Vallo Varik's avatar
Vallo Varik committed
62

Vallo Varik's avatar
Vallo Varik committed
63
4. Add an `OD` variable to your dataframe for background subtracted OD. You
Vallo Varik's avatar
Vallo Varik committed
64
65
66
   need two things: 1) to `group` the data and 2) a way to point to background
   wells. Since grouping takes a bit practice until it becomes easy, I will
   just say that you need to subtract background on each day, on each plate,
Vallo Varik's avatar
Vallo Varik committed
67
   in each timepoint. The wells with no bacteria were encoded to have `uM = -1` i.e. after appropriate grouping it comes down to: `OD = OD/OD[uM == -1]`. Input [data](doc/tasks/03_dat.csv) is the same as in step 3 above.
68
   And if you now plot everything exactly as in step 3 above, except having OD on y-axis, here's what [output](doc/tasks/04_out.pdf) should look like.
Vallo Varik's avatar
Vallo Varik committed
69

Vallo Varik's avatar
Vallo Varik committed
70
71
72
73
74
75
76
77
78
5. Constrain the OD at limit of detection. You might have noticed on the
   previous plot that some of the growth curves start at very low values. In
   fact, some of the ODs ended up negative. This is because the values are
   actually lower bound by limit of detection (LOD). Experience tells that at
   OD~595~ with 30 µL/well in LB, the limit of detection is ~0.03. So the
   final step for deriving background subtracted ODs is to constrain OD at
   0.03. Multiple ways are again possible, I would go for `ifelse` statement.
   Here's what the resulting [output](doc/tasks/05_out.pdf) plot should look
   like.
Vallo Varik's avatar
Vallo Varik committed
79
80
81
82
83
84
85
86

6. Add a `Fit` variable to your dataframe for fitness. OD is a fine measure and much can be learned staring at growth curves [[ref](https://www.annualreviews.org/doi/abs/10.1146/annurev.mi.03.100149.002103)].
   But we're interested in the effect of the drug i.e. how much
   better/worse do bacteria grow upon treatment. To that end, use the same
   grouping as for OD (on each day, on each plate, in each timepoint) and
   derive fitness as `OD = OD/OD[uM == 0]`. Please also  constrain `Fit` to
   1.1 (there's no real need for constraining fitness, it is just for making
   plots look nicer). Here's what [output](doc/tasks/06_out.pdf) plot should look like if you now plot everything exactly as in the step above, except having `Fit` on y-axis.