Vallo Varik committed May 16, 2022 1 2 3 4 5 6 7 ``````# Motivation Getting up to speed with R using dose-response for 32 drugs against 6 bacterial strains. # Tasks `````` Vallo Varik committed May 17, 2022 8 ``````In the following, we go through the most common steps in data analysis: `````` Vallo Varik committed May 18, 2022 9 10 ``````exploration, transformation (i.e. deriving new variables) and modeling. Integral to all steps is visualization i.e. making graphs. `````` Vallo Varik committed May 16, 2022 11 12 13 `````` ## Explore `````` Vallo Varik committed May 17, 2022 14 ``````As a first look, the exploratory plots are informative and serve as a `````` Vallo Varik committed May 17, 2022 15 ``````quality control i.e. you check that there is nothing extra suspicious `````` Vallo Varik committed May 17, 2022 16 ``````going on. Raw OD will suffice for that. `````` Vallo Varik committed May 17, 2022 17 `````` `````` Vallo Varik committed May 17, 2022 18 ``````1. Plot growth curves following raw OD in time. Input `````` Vallo Varik committed May 18, 2022 19 20 21 22 `````` [data](doc/tasks/01_dat.csv) is provided and expected output plot is shown below. The data is for azithromycin against *S. flexneri* M90T from day 2022-05-04 (first replicate). *A tip: Use `facet_wrap` with `nrow = 1` argument to have different concentrations on separate `````` Vallo Varik committed May 17, 2022 23 24 25 `````` plots.* ![](doc/tasks/01_out.png) `````` Vallo Varik committed May 16, 2022 26 `````` `````` Vallo Varik committed May 17, 2022 27 28 ``````2. Try again, now with [data](doc/tasks/02_dat.csv) from two days (let us plot days in different color). In addition, transform the y-axis `````` Vallo Varik committed May 17, 2022 29 30 31 32 `````` to logarithmic scale. Expected output is shown below. *A tip: you need to turn the `Date` variable into a factor.* ![](doc/tasks/02_out.png) `````` Vallo Varik committed May 17, 2022 33 `````` `````` Vallo Varik committed May 17, 2022 34 ``````3. Once more, now with [data](doc/tasks/03_dat.csv) from three days. `````` Vallo Varik committed May 17, 2022 35 `````` You will encounter an issue because there were two biological `````` Vallo Varik committed May 18, 2022 36 37 38 `````` replicates on third day. There are multiple ways to overcome this. I recommend to solve it via `group` parameter of `aes` e.g.  `ggplot(aes(..., group = Plt))`. `````` Vallo Varik committed May 17, 2022 39 40 `````` ![](doc/tasks/03_out.png) `````` Vallo Varik committed May 17, 2022 41 42 43 `````` ## Transform `````` Vallo Varik committed May 17, 2022 44 ``````To quantify the growth (either rate or yield) one needs to subtract the `````` Vallo Varik committed May 17, 2022 45 46 ``````background from raw OD. There are two ways to do that: 1) using a readout from just the medium; 2) using the smallest value per well `````` Vallo Varik committed May 17, 2022 47 48 ``````(i.e. OD in one of the first timepoints of a particular well). I prefer to use the former whenever possible. `````` Vallo Varik committed May 17, 2022 49 50 51 52 53 54 `````` 1. Add an `OD` variable to your dataframe for background subtracted OD. You need two things: 1) to `group` the data and 2) a way to point to background wells. Since grouping takes a bit practice until it becomes easy, I will just say that you need to subtract background on each day, on each plate, in each timepoint. The wells with no `````` Vallo Varik committed May 17, 2022 55 56 `````` bacteria were encoded to have `uM = -1` i.e. after appropriate grouping it comes down to: `OD = OD/OD[uM == -1]`. Input `````` Vallo Varik committed May 18, 2022 57 58 `````` [data](doc/tasks/03_dat.csv) is the same as in step 3 above. Finally, plot the result exactly as in step 3 above, except have OD `````` Vallo Varik committed May 17, 2022 59 60 61 62 `````` on y-axis. I choose also to drop the background control (`uM == -1`). ![](doc/tasks/04_out.png) `````` Vallo Varik committed May 17, 2022 63 `````` `````` Vallo Varik committed May 17, 2022 64 65 ``````2. Constrain the OD at limit of detection. You might notice on the previous plot that some of the growth curves start at very low `````` Vallo Varik committed May 17, 2022 66 67 68 69 70 `````` values. In fact, some of the ODs ended up negative. This is because the values are actually lower bound by limit of detection (LOD). Experience tells that at OD595 with 30 µL/well in LB, the limit of detection is ~0.03. So the final step for deriving background subtracted ODs is to constrain OD at 0.03. Multiple ways `````` Vallo Varik committed May 18, 2022 71 72 `````` are again possible, I would go for `ifelse` statement. Finally, plot the result as you did above. `````` Vallo Varik committed May 17, 2022 73 74 `````` ![](doc/tasks/05_out.png) `````` Vallo Varik committed May 17, 2022 75 76 77 78 79 80 81 82 `````` 3. Add a `Fit` variable to your dataframe for fitness. OD is a fine measure and much can be learned staring at growth curves \[[ref](https://www.annualreviews.org/doi/abs/10.1146/annurev.mi.03.100149.002103)\]. But we’re interested in the effect of the drug i.e. how much better/worse do bacteria grow upon treatment. To that end, use the same grouping as for OD (on each day, on each plate, in each timepoint) and derive fitness as `OD = OD/OD[uM == 0]`. Please also `````` Vallo Varik committed May 18, 2022 83 84 `````` constrain `Fit` to 1.1 (just for making plots look nicer). Finally, plot the result as in the step above, except have `Fit` on y-axis. `````` Vallo Varik committed May 17, 2022 85 86 87 88 89 90 `````` ![output](doc/tasks/06_out.png) ## Model The distinction between transforming and modeling is a subtle one, `````` Vallo Varik committed May 18, 2022 91 92 93 ``````arbitrary really. Modeling usually entails slightly more sophisticated transformations to summarize data and to ask questions e.g. ‘is this different from that’. `````` Vallo Varik committed May 31, 2022 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 `````` Looking at above plot, it seems that about 8-10h would be a good time to score the effect of a drug (fitness is maximally affected there for concentrations/replicates). Let’s check the closest timepoints to 10h: filter(dat, Time_h < 10.5, Time_h > 9.5) %>% count(Time_h) ## # A tibble: 4 × 4 ## # Groups: Date, Plt, Time_h [4] ## Date Plt Time_h n ## ## 1 2022-05-04 4 9.73 12 ## 2 2022-05-05 2 9.73 12 ## 3 2022-05-13 1 9.73 12 ## 4 2022-05-13 6 9.74 12 Good, filtering for `Time_h` between 9.5 and 10.5 h gives a desired result. 1. Let us plot the dose-response curve. Dose-response curves are sigmoidal only if the dose axis is multiplicative i.e. logarithmic, so let us do that. `````` Vallo Varik committed May 31, 2022 119 `````` `````` Vallo Varik committed Jun 07, 2022 120 121 122 123 124 125 126 127 128 ``````2. Now we fit a curve to this data. We have to use some equation that describes the curve which is clearly not linear. We cannot avoid a little bit background here. Looks more complicated than it is, please read about: - The basics of a four parameter logistic regression [4PL](doc/4pl.md) - [Formula interface](doc/formulaR.md), a very powerful and indispensable tool for modelling in R. `````` Vallo Varik committed Jun 07, 2022 129 130 `````` - Introduction to [drc](doc/drc.md) package, a dedicated package for DoseResponseCurves that allows one to fit them in no time. `````` Vallo Varik committed Jun 07, 2022 131 132 133 134 135 136 137 138 139 `````` After all this reading, one must be hyngry for data analysis. Load `drc` library and see if you can make the plot below. There is one issue: `drm` does not know how to handle `uM = -1` – not a real concentration anyways, encoding we used for background control – so get rid of that before fitting. ``````