README.md 5.85 KB
Newer Older
Vallo Varik's avatar
Vallo Varik committed
1
2
3
4
5
6
7
# Motivation

Getting up to speed with R using dose-response for 32 drugs against 6
bacterial strains.

# Tasks

Vallo Varik's avatar
Vallo Varik committed
8
In the following, we go through the most common steps in data analysis:
Vallo Varik's avatar
Vallo Varik committed
9
10
exploration, transformation (i.e. deriving new variables) and modeling.
Integral to all steps is visualization i.e. making graphs.
Vallo Varik's avatar
Vallo Varik committed
11
12
13

## Explore

Vallo Varik's avatar
Vallo Varik committed
14
As a first look, the exploratory plots are informative and serve as a
Vallo Varik's avatar
Vallo Varik committed
15
quality control i.e. you check that there is nothing extra suspicious
Vallo Varik's avatar
Vallo Varik committed
16
going on. Raw OD will suffice for that.
Vallo Varik's avatar
Vallo Varik committed
17

Vallo Varik's avatar
Vallo Varik committed
18
1.  Plot growth curves following raw OD in time. Input
Vallo Varik's avatar
Vallo Varik committed
19
20
21
22
    [data](doc/tasks/01_dat.csv) is provided and expected output plot is
    shown below. The data is for azithromycin against *S. flexneri* M90T
    from day 2022-05-04 (first replicate). *A tip: Use `facet_wrap` with
    `nrow = 1` argument to have different concentrations on separate
Vallo Varik's avatar
Vallo Varik committed
23
24
25
    plots.*

    ![](doc/tasks/01_out.png)
Vallo Varik's avatar
Vallo Varik committed
26

Vallo Varik's avatar
Vallo Varik committed
27
28
2.  Try again, now with [data](doc/tasks/02_dat.csv) from two days (let
    us plot days in different color). In addition, transform the y-axis
Vallo Varik's avatar
Vallo Varik committed
29
30
31
32
    to logarithmic scale. Expected output is shown below. *A tip: you
    need to turn the `Date` variable into a factor.*

    ![](doc/tasks/02_out.png)
Vallo Varik's avatar
Vallo Varik committed
33

Vallo Varik's avatar
Vallo Varik committed
34
3.  Once more, now with [data](doc/tasks/03_dat.csv) from three days.
Vallo Varik's avatar
Vallo Varik committed
35
    You will encounter an issue because there were two biological
Vallo Varik's avatar
Vallo Varik committed
36
37
38
    replicates on third day. There are multiple ways to overcome this. I
    recommend to solve it via `group` parameter of `aes` e.g. 
    `ggplot(aes(..., group = Plt))`.
Vallo Varik's avatar
Vallo Varik committed
39
40

    ![](doc/tasks/03_out.png)
Vallo Varik's avatar
Vallo Varik committed
41
42
43

## Transform

Vallo Varik's avatar
Vallo Varik committed
44
To quantify the growth (either rate or yield) one needs to subtract the
Vallo Varik's avatar
Vallo Varik committed
45
46
background from raw OD. There are two ways to do that: 1) using a
readout from just the medium; 2) using the smallest value per well
Vallo Varik's avatar
Vallo Varik committed
47
48
(i.e. OD in one of the first timepoints of a particular well). I prefer
to use the former whenever possible.
Vallo Varik's avatar
Vallo Varik committed
49
50
51
52
53
54

1.  Add an `OD` variable to your dataframe for background subtracted OD.
    You need two things: 1) to `group` the data and 2) a way to point to
    background wells. Since grouping takes a bit practice until it
    becomes easy, I will just say that you need to subtract background
    on each day, on each plate, in each timepoint. The wells with no
Vallo Varik's avatar
Vallo Varik committed
55
56
    bacteria were encoded to have `uM = -1` i.e. after appropriate
    grouping it comes down to: `OD = OD/OD[uM == -1]`. Input
Vallo Varik's avatar
Vallo Varik committed
57
58
    [data](doc/tasks/03_dat.csv) is the same as in step 3 above.
    Finally, plot the result exactly as in step 3 above, except have OD
Vallo Varik's avatar
Vallo Varik committed
59
60
61
62
    on y-axis. I choose also to drop the background control
    (`uM == -1`).

    ![](doc/tasks/04_out.png)
Vallo Varik's avatar
Vallo Varik committed
63

Vallo Varik's avatar
Vallo Varik committed
64
65
2.  Constrain the OD at limit of detection. You might notice on the
    previous plot that some of the growth curves start at very low
Vallo Varik's avatar
Vallo Varik committed
66
67
68
69
70
    values. In fact, some of the ODs ended up negative. This is because
    the values are actually lower bound by limit of detection (LOD).
    Experience tells that at OD<sub>595</sub> with 30 µL/well in LB, the
    limit of detection is ~0.03. So the final step for deriving
    background subtracted ODs is to constrain OD at 0.03. Multiple ways
Vallo Varik's avatar
Vallo Varik committed
71
72
    are again possible, I would go for `ifelse` statement. Finally, plot
    the result as you did above.
Vallo Varik's avatar
Vallo Varik committed
73
74

    ![](doc/tasks/05_out.png)
Vallo Varik's avatar
Vallo Varik committed
75
76
77
78
79
80
81
82

3.  Add a `Fit` variable to your dataframe for fitness. OD is a fine
    measure and much can be learned staring at growth curves
    \[[ref](https://www.annualreviews.org/doi/abs/10.1146/annurev.mi.03.100149.002103)\].
    But we’re interested in the effect of the drug i.e. how much
    better/worse do bacteria grow upon treatment. To that end, use the
    same grouping as for OD (on each day, on each plate, in each
    timepoint) and derive fitness as `OD = OD/OD[uM == 0]`. Please also
Vallo Varik's avatar
Vallo Varik committed
83
84
    constrain `Fit` to 1.1 (just for making plots look nicer). Finally,
    plot the result as in the step above, except have `Fit` on y-axis.
Vallo Varik's avatar
Vallo Varik committed
85
86
87
88
89
90

    ![output](doc/tasks/06_out.png)

## Model

The distinction between transforming and modeling is a subtle one,
Vallo Varik's avatar
Vallo Varik committed
91
92
93
arbitrary really. Modeling usually entails slightly more sophisticated
transformations to summarize data and to ask questions e.g. ‘is this
different from that’.
Vallo Varik's avatar
Vallo Varik committed
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118

Looking at above plot, it seems that about 8-10h would be a good time to
score the effect of a drug (fitness is maximally affected there for
concentrations/replicates). Let’s check the closest timepoints to 10h:

    filter(dat, Time_h < 10.5, Time_h > 9.5) %>%
       count(Time_h)

    ## # A tibble: 4 × 4
    ## # Groups:   Date, Plt, Time_h [4]
    ##   Date         Plt Time_h     n
    ##   <date>     <dbl>  <dbl> <int>
    ## 1 2022-05-04     4   9.73    12
    ## 2 2022-05-05     2   9.73    12
    ## 3 2022-05-13     1   9.73    12
    ## 4 2022-05-13     6   9.74    12

Good, filtering for `Time_h` between 9.5 and 10.5 h gives a desired
result.

1.  Let us plot the dose-response curve. Dose-response curves are
    sigmoidal only if the dose axis is multiplicative i.e. logarithmic,
    so let us do that.

    <img src="doc/tasks/07_out.png" style="width:60.0%" />
119

Vallo Varik's avatar
Vallo Varik committed
120
121
122
123
124
125
126
127
128
2.  Now we fit a curve to this data. We have to use some equation that
    describes the curve which is clearly not linear. We cannot avoid a
    little bit background here. Looks more complicated than it is,
    please read about:

    -   The basics of a four parameter logistic regression
        [4PL](doc/4pl.md)
    -   [Formula interface](doc/formulaR.md), a very powerful and
        indispensable tool for modelling in R.
Vallo Varik's avatar
Add drc    
Vallo Varik committed
129
130
    -   Introduction to [drc](doc/drc.md) package, a dedicated package
        for DoseResponseCurves that allows one to fit them in no time.
Vallo Varik's avatar
Vallo Varik committed
131
132
133
134
135
136
137
138
139

    After all this reading, one must be hyngry for data analysis. Load
    `drc` library and see if you can make the plot below. There is one
    issue: `drm` does not know how to handle `uM = -1` – not a real
    concentration anyways, encoding we used for background control – so
    get rid of that before fitting.

    <img src="doc/tasks/08_out.png" style="width:60.0%" />