README.Rmd 6.84 KB
Newer Older
Vallo Varik's avatar
Vallo Varik committed
1
2
3
4
5
6
7
8
9
10
11
---
title: "32 drugs"
output: 
    md_document:
      preserve_yaml: FALSE
      fig_width: 7
      fig_height: 5
      toc: yes
      toc_depth: 2
---

Vallo Varik's avatar
Add toc    
Vallo Varik committed
12
13
14

[[_TOC_]]

Vallo Varik's avatar
Vallo Varik committed
15
# Motivation
Vallo Varik's avatar
Vallo Varik committed
16

Vallo Varik's avatar
Vallo Varik committed
17
Getting up to speed with R using dose-response for 32 drugs against 6
Vallo Varik's avatar
Vallo Varik committed
18
bacterial strains.
Vallo Varik's avatar
Vallo Varik committed
19
20


Vallo Varik's avatar
Vallo Varik committed
21
```{r setup, include=FALSE}
Vallo Varik's avatar
Vallo Varik committed
22
knitr::opts_chunk$set(echo=FALSE, warning=FALSE, message=FALSE, dpi=300)
Vallo Varik's avatar
Vallo Varik committed
23
24
25
26
27
knitr::opts_knit$set(global.par = TRUE)
library('tidyverse')
```


Vallo Varik's avatar
Vallo Varik committed
28
29
# Tasks

Vallo Varik's avatar
Vallo Varik committed
30
In the following, we go through the most common steps in data analysis:
Vallo Varik's avatar
Vallo Varik committed
31
32
exploration, transformation (i.e. deriving new variables) and modeling.
Integral to all steps is visualization i.e. making graphs. 
Vallo Varik's avatar
Vallo Varik committed
33
34
35

## Explore

Vallo Varik's avatar
Vallo Varik committed
36
As a first look, the exploratory plots are informative and serve as a quality
Vallo Varik's avatar
Vallo Varik committed
37
control i.e. you check that there is nothing extra suspicious going on. Raw OD will suffice for that.
Vallo Varik's avatar
Vallo Varik committed
38

Vallo Varik's avatar
Vallo Varik committed
39
1. Plot growth curves following raw OD in time. Input
Vallo Varik's avatar
Vallo Varik committed
40
41
   [data](doc/tasks/01_dat.csv) is provided and expected output plot is shown below. The data is for azithromycin against _S. flexneri_ M90T
   from day 2022-05-04 (first replicate). _A tip: Use `facet_wrap` with `nrow = 1` argument to have different concentrations on separate plots._
Vallo Varik's avatar
Vallo Varik committed
42

Vallo Varik's avatar
Vallo Varik committed
43
44
   ![](doc/tasks/01_out.png)

Vallo Varik's avatar
Vallo Varik committed
45
46
2. Try again, now with [data](doc/tasks/02_dat.csv) from two days (let us plot
   days in different color). In addition, transform the y-axis to logarithmic
Vallo Varik's avatar
Vallo Varik committed
47
   scale. Expected output is shown below. _A tip: you need to turn
Vallo Varik's avatar
Vallo Varik committed
48
49
   the `Date` variable into a factor._

Vallo Varik's avatar
Vallo Varik committed
50
51
52
53
   ![](doc/tasks/02_out.png)

3. Once more, now with [data](doc/tasks/03_dat.csv) from three days. You will
   encounter an issue because there were two biological replicates on third
Vallo Varik's avatar
Vallo Varik committed
54
55
   day. There are multiple ways to overcome this. I recommend to
   solve it via `group` parameter of `aes` e.g. 
Vallo Varik's avatar
Vallo Varik committed
56
57
58
   `ggplot(aes(..., group = Plt))`.

   ![](doc/tasks/03_out.png)
Vallo Varik's avatar
Vallo Varik committed
59
60
61

## Transform

Vallo Varik's avatar
Vallo Varik committed
62
To quantify the growth (either rate or yield) one needs to subtract the
Vallo Varik's avatar
Vallo Varik committed
63
background from raw OD. There are two ways to do that: 1) using a readout from
Vallo Varik's avatar
Vallo Varik committed
64
65
66
just the medium; 2) using the smallest value per well (i.e. OD in one of the
first timepoints of a particular well). I prefer to use the former whenever
possible.
Vallo Varik's avatar
Vallo Varik committed
67

Vallo Varik's avatar
Vallo Varik committed
68
4. Add an `OD` variable to your dataframe for background subtracted OD. You
Vallo Varik's avatar
Vallo Varik committed
69
70
71
   need two things: 1) to `group` the data and 2) a way to point to background
   wells. Since grouping takes a bit practice until it becomes easy, I will
   just say that you need to subtract background on each day, on each plate,
Vallo Varik's avatar
Vallo Varik committed
72
   in each timepoint. The wells with no bacteria were encoded to have `uM = -1` i.e. after appropriate grouping it comes down to: `OD = OD/OD[uM == -1]`. Input [data](doc/tasks/03_dat.csv) is the same as in step 3 above.
Vallo Varik's avatar
Vallo Varik committed
73
   Finally, plot the result exactly as in step 3 above, except have OD on y-axis. I choose also to drop the background control (`uM == -1`).
Vallo Varik's avatar
Vallo Varik committed
74

Vallo Varik's avatar
Vallo Varik committed
75
76
77
   ![](doc/tasks/04_out.png)

5. Constrain the OD at limit of detection. You might notice on the
Vallo Varik's avatar
Vallo Varik committed
78
79
80
81
82
83
   previous plot that some of the growth curves start at very low values. In
   fact, some of the ODs ended up negative. This is because the values are
   actually lower bound by limit of detection (LOD). Experience tells that at
   OD~595~ with 30 µL/well in LB, the limit of detection is ~0.03. So the
   final step for deriving background subtracted ODs is to constrain OD at
   0.03. Multiple ways are again possible, I would go for `ifelse` statement.
Vallo Varik's avatar
Vallo Varik committed
84
   Finally, plot the result as you did above.
Vallo Varik's avatar
Vallo Varik committed
85
86

   ![](doc/tasks/05_out.png)
Vallo Varik's avatar
Vallo Varik committed
87
88
89
90
91
92

6. Add a `Fit` variable to your dataframe for fitness. OD is a fine measure and much can be learned staring at growth curves [[ref](https://www.annualreviews.org/doi/abs/10.1146/annurev.mi.03.100149.002103)].
   But we're interested in the effect of the drug i.e. how much
   better/worse do bacteria grow upon treatment. To that end, use the same
   grouping as for OD (on each day, on each plate, in each timepoint) and
   derive fitness as `OD = OD/OD[uM == 0]`. Please also  constrain `Fit` to
Vallo Varik's avatar
Vallo Varik committed
93
94
   1.1 (just for making plots look nicer). Finally, plot the result as in the
   step above, except have `Fit` on y-axis.
Vallo Varik's avatar
Vallo Varik committed
95
96
97
98
99
100
101

   ![output](doc/tasks/06_out.png) 


## Model

The distinction between transforming and modeling is a subtle one, arbitrary
Vallo Varik's avatar
Vallo Varik committed
102
103
really. Modeling usually entails slightly more sophisticated transformations
to summarize data and to ask questions e.g. 'is this different from that'.
Vallo Varik's avatar
Vallo Varik committed
104
105
106
107
108
109

Looking at above plot, it seems that about 8-10h would be a good time to
score the effect of a drug (fitness is maximally affected there for
concentrations/replicates). Let's check the closest timepoints to 10h:
 
```{r echo=T}
Vallo Varik's avatar
Vallo Varik committed
110
filter(dat, Time_h > 9.5, Time_h < 10.5) %>%
Vallo Varik's avatar
Vallo Varik committed
111
112
113
114
115
116
117
118
119
   count(Time_h)
```

Good, filtering for `Time_h` between 9.5 and 10.5 h gives a desired result.

7. Let us plot the dose-response curve. Dose-response curves are sigmoidal only
   if the dose axis is multiplicative i.e. logarithmic, so let us do that.
    
   ![](doc/tasks/07_out.png){width=60%}
120
121


Vallo Varik's avatar
Vallo Varik committed
122
123
124
125
126
127
128
8. Now we fit a curve to this data. We have to use some equation that describes
   the curve which is clearly not linear. We cannot avoid a little bit
   background here. Looks more complicated than it is, please read about:

   - The basics of a four parameter logistic regression [4PL](doc/4pl.md)
   - [Formula interface](doc/formulaR.md), a very powerful and indispensable
     tool for modelling in R.
Vallo Varik's avatar
Add drc    
Vallo Varik committed
129
130
   - Introduction to [drc](doc/drc.md) package, a dedicated package for
     DoseResponseCurves that allows one to fit them in no time.
Vallo Varik's avatar
Vallo Varik committed
131
  
Vallo Varik's avatar
Vallo Varik committed
132
   After all this reading, one must be hyngry for data analysis. Load `drc` library and see if you can make the plot below. There is one issue though: `drm` does not know how to handle `uM = -1` --  not a real concentration anyways, encoding we used for background control --  so get rid of that before fitting.
Vallo Varik's avatar
Vallo Varik committed
133
134
135
    
   ![](doc/tasks/08_out.png){width=60%}

Vallo Varik's avatar
Vallo Varik committed
136
137
138
139
9. Finally. Let us propose the MIC is the concentration at which the
   azithromycin excerts 95% effect. Can you calculate that with confidence
   intervals?

Vallo Varik's avatar
Vallo Varik committed
140
10. There is one more thing. An intricacy. We have fitted and plotted (i.e.
Vallo Varik's avatar
Vallo Varik committed
141
    think about) the `x`, the concentration, in logarithmic scale, but the
Vallo Varik's avatar
Vallo Varik committed
142
143
    IC~50~ is in linear scale. Mostly, it does not matter much. You can see
    above, however, that the lower confidence interval is 5x lower than IC~95~
Vallo Varik's avatar
Vallo Varik committed
144
    and the upper limit is less than 2x higher. One side is 5x away, the other
Vallo Varik's avatar
Vallo Varik committed
145
146
147
148
    less than 2x. To fix that, one could estimate IC~50~ in log scale
    (substitute IC~50~ in the 4-parameter logistic regression with
    log(IC~50~)). One might have to take some time to think about it what that means. Luckily, `drm` makes all this easy. You fit the model exactly as
    you did before, but this time, set `fct` to `LL2.4()`. Finally, when
Vallo Varik's avatar
Vallo Varik committed
149
150
    calculating MIC, the confidence interval should be set to "fls" 
    (`interval = "fls"`). 
Vallo Varik's avatar
Vallo Varik committed
151

Vallo Varik's avatar
Vallo Varik committed
152
153
    Now the MIC (IC~95~) should be the same you got with `LL.4` (33 µM), but
    the confidence intervals are symmetric, about 2x lower and 2x higher.