README.md 4.01 KB
Newer Older
Vallo Varik's avatar
Vallo Varik committed
1
2
3
4
5
6
7
# Motivation

Getting up to speed with R using dose-response for 32 drugs against 6
bacterial strains.

# Tasks

Vallo Varik's avatar
Vallo Varik committed
8
In the following, we go through the most common steps in data analysis:
Vallo Varik's avatar
Vallo Varik committed
9
10
11
12
exploration, transformation (i.e. deriving new variables) and modeling
(using statistical tools to answer questions: what happens on average,
which conditions are different and so on). Integral to all steps is
visualization i.e. making graphs.
Vallo Varik's avatar
Vallo Varik committed
13
14
15

## Explore

Vallo Varik's avatar
Vallo Varik committed
16
As a first look, the exploratory plots are informative and serve as a
Vallo Varik's avatar
Vallo Varik committed
17
quality control i.e. you check that there is nothing extra suspicious
Vallo Varik's avatar
Vallo Varik committed
18
going on. Raw OD will suffice for that.
Vallo Varik's avatar
Vallo Varik committed
19

Vallo Varik's avatar
Vallo Varik committed
20
21
22
1.  Plot growth curves following raw OD in time. Input
    [data](doc/tasks/01_dat.csv) and expected
    [output](doc/tasks/01_out.pdf) plot are provided. The data is for
Vallo Varik's avatar
Vallo Varik committed
23
    azithromycin against *S. flexneri* M90T from day 2022-05-04 (first
Vallo Varik's avatar
Vallo Varik committed
24
    replicate). *A tip: Use `facet_wrap` with `ncol = 1` argument to
Vallo Varik's avatar
Vallo Varik committed
25
    have different concentrations on separate plots.*
Vallo Varik's avatar
Vallo Varik committed
26

Vallo Varik's avatar
Vallo Varik committed
27
28
2.  Try again, now with [data](doc/tasks/02_dat.csv) from two days (let
    us plot days in different color). In addition, transform the y-axis
Vallo Varik's avatar
Vallo Varik committed
29
30
31
    to logarithmic scale. Expected [output](doc/tasks/02_out.pdf). *A
    tip: you need to turn the `Date` variable into a factor.*

Vallo Varik's avatar
Vallo Varik committed
32
33
3.  Once more, now with [data](doc/tasks/03_dat.csv) from three days.
    Expected [output](doc/tasks/03_out.pdf). You will encounter an issue
Vallo Varik's avatar
Vallo Varik committed
34
    because there were two biological replicates on third day. There are
Vallo Varik's avatar
Vallo Varik committed
35
36
37
    multiple ways to overcome this, but for now, I recommend to solve by
    using `group` parameter of `aes`
    e.g. `ggplot(aes(..., group = Plt))`.
Vallo Varik's avatar
Vallo Varik committed
38
39
40

## Transform

Vallo Varik's avatar
Vallo Varik committed
41
To quantify the growth (either rate or yield) one needs to subtract the
Vallo Varik's avatar
Vallo Varik committed
42
43
background from raw OD. There are two ways to do that: 1) using a
readout from just the medium; 2) using the smallest value per well
Vallo Varik's avatar
Vallo Varik committed
44
45
(i.e. OD in one of the first timepoints of a particular well). I prefer
to use the former whenever possible.
Vallo Varik's avatar
Vallo Varik committed
46
47
48
49
50
51

1.  Add an `OD` variable to your dataframe for background subtracted OD.
    You need two things: 1) to `group` the data and 2) a way to point to
    background wells. Since grouping takes a bit practice until it
    becomes easy, I will just say that you need to subtract background
    on each day, on each plate, in each timepoint. The wells with no
Vallo Varik's avatar
Vallo Varik committed
52
53
    bacteria were encoded to have `uM = -1` i.e. after appropriate
    grouping it comes down to: `OD = OD/OD[uM == -1]`. Input
Vallo Varik's avatar
Vallo Varik committed
54
    [data](doc/tasks/03_dat.csv) is the same as in step 3 above. And if
55
56
57
    you now plot everything exactly as in step 3 above, except having OD
    on y-axis, here’s what [output](doc/tasks/04_out.pdf) should look
    like.
Vallo Varik's avatar
Vallo Varik committed
58
59
60
61
62
63
64
65
66
67

2.  Constrain the OD at limit of detection. You might have noticed on
    the previous plot that some of the growth curves start at very low
    values. In fact, some of the ODs ended up negative. This is because
    the values are actually lower bound by limit of detection (LOD).
    Experience tells that at OD<sub>595</sub> with 30 µL/well in LB, the
    limit of detection is ~0.03. So the final step for deriving
    background subtracted ODs is to constrain OD at 0.03. Multiple ways
    are again possible, I would go for `ifelse` statement. Here’s what
    the resulting [output](doc/tasks/05_out.pdf) plot should look like.
Vallo Varik's avatar
Vallo Varik committed
68
69
70
71
72
73
74
75
76
77
78
79
80

3.  Add a `Fit` variable to your dataframe for fitness. OD is a fine
    measure and much can be learned staring at growth curves
    \[[ref](https://www.annualreviews.org/doi/abs/10.1146/annurev.mi.03.100149.002103)\].
    But we’re interested in the effect of the drug i.e. how much
    better/worse do bacteria grow upon treatment. To that end, use the
    same grouping as for OD (on each day, on each plate, in each
    timepoint) and derive fitness as `OD = OD/OD[uM == 0]`. Please also
    constrain `Fit` to 1.1 (there’s no real need for constraining
    fitness, it is just for making plots look nicer). Here’s what
    [output](doc/tasks/06_out.pdf) plot should look like if you now plot
    everything exactly as in the step above, except having `Fit` on
    y-axis.