Vallo Varik committed May 16, 2022 1 2 3 4 5 6 7 ``````# Motivation Getting up to speed with R using dose-response for 32 drugs against 6 bacterial strains. # Tasks `````` Vallo Varik committed May 17, 2022 8 ``````In the following, we go through the most common steps in data analysis: `````` Vallo Varik committed May 17, 2022 9 10 11 12 ``````exploration, transformation (i.e. deriving new variables) and modeling (using statistical tools to answer questions: what happens on average, which conditions are different and so on). Integral to all steps is visualization i.e. making graphs. `````` Vallo Varik committed May 16, 2022 13 14 15 `````` ## Explore `````` Vallo Varik committed May 17, 2022 16 ``````As a first look, the exploratory plots are informative and serve as a `````` Vallo Varik committed May 17, 2022 17 ``````quality control i.e. you check that there is nothing extra suspicious `````` Vallo Varik committed May 17, 2022 18 ``````going on. Raw OD will suffice for that. `````` Vallo Varik committed May 17, 2022 19 `````` `````` Vallo Varik committed May 17, 2022 20 21 22 ``````1. Plot growth curves following raw OD in time. Input [data](doc/tasks/01_dat.csv) and expected [output](doc/tasks/01_out.pdf) plot are provided. The data is for `````` Vallo Varik committed May 16, 2022 23 `````` azithromycin against *S. flexneri* M90T from day 2022-05-04 (first `````` Vallo Varik committed May 17, 2022 24 `````` replicate). *A tip: Use `facet_wrap` with `ncol = 1` argument to `````` Vallo Varik committed May 16, 2022 25 `````` have different concentrations on separate plots.* `````` Vallo Varik committed May 16, 2022 26 `````` `````` Vallo Varik committed May 17, 2022 27 28 ``````2. Try again, now with [data](doc/tasks/02_dat.csv) from two days (let us plot days in different color). In addition, transform the y-axis `````` Vallo Varik committed May 17, 2022 29 30 31 `````` to logarithmic scale. Expected [output](doc/tasks/02_out.pdf). *A tip: you need to turn the `Date` variable into a factor.* `````` Vallo Varik committed May 17, 2022 32 33 ``````3. Once more, now with [data](doc/tasks/03_dat.csv) from three days. Expected [output](doc/tasks/03_out.pdf). You will encounter an issue `````` Vallo Varik committed May 17, 2022 34 `````` because there were two biological replicates on third day. There are `````` Vallo Varik committed May 17, 2022 35 36 37 `````` multiple ways to overcome this, but for now, I recommend to solve by using `group` parameter of `aes` e.g. `ggplot(aes(..., group = Plt))`. `````` Vallo Varik committed May 17, 2022 38 39 40 `````` ## Transform `````` Vallo Varik committed May 17, 2022 41 ``````To quantify the growth (either rate or yield) one needs to subtract the `````` Vallo Varik committed May 17, 2022 42 43 ``````background from raw OD. There are two ways to do that: 1) using a readout from just the medium; 2) using the smallest value per well `````` Vallo Varik committed May 17, 2022 44 45 ``````(i.e. OD in one of the first timepoints of a particular well). I prefer to use the former whenever possible. `````` Vallo Varik committed May 17, 2022 46 47 48 49 50 51 `````` 1. Add an `OD` variable to your dataframe for background subtracted OD. You need two things: 1) to `group` the data and 2) a way to point to background wells. Since grouping takes a bit practice until it becomes easy, I will just say that you need to subtract background on each day, on each plate, in each timepoint. The wells with no `````` Vallo Varik committed May 17, 2022 52 53 `````` bacteria were encoded to have `uM = -1` i.e. after appropriate grouping it comes down to: `OD = OD/OD[uM == -1]`. Input `````` Vallo Varik committed May 17, 2022 54 `````` [data](doc/tasks/03_dat.csv) is the same as in step 3 above. And if `````` Vallo Varik committed May 17, 2022 55 56 57 `````` you now plot everything exactly as in step 3 above, except having OD on y-axis, here’s what [output](doc/tasks/04_out.pdf) should look like. `````` Vallo Varik committed May 17, 2022 58 59 60 61 62 63 64 65 66 67 `````` 2. Constrain the OD at limit of detection. You might have noticed on the previous plot that some of the growth curves start at very low values. In fact, some of the ODs ended up negative. This is because the values are actually lower bound by limit of detection (LOD). Experience tells that at OD595 with 30 µL/well in LB, the limit of detection is ~0.03. So the final step for deriving background subtracted ODs is to constrain OD at 0.03. Multiple ways are again possible, I would go for `ifelse` statement. Here’s what the resulting [output](doc/tasks/05_out.pdf) plot should look like. `````` Vallo Varik committed May 17, 2022 68 69 70 71 72 73 74 75 76 77 78 79 80 `````` 3. Add a `Fit` variable to your dataframe for fitness. OD is a fine measure and much can be learned staring at growth curves \[[ref](https://www.annualreviews.org/doi/abs/10.1146/annurev.mi.03.100149.002103)\]. But we’re interested in the effect of the drug i.e. how much better/worse do bacteria grow upon treatment. To that end, use the same grouping as for OD (on each day, on each plate, in each timepoint) and derive fitness as `OD = OD/OD[uM == 0]`. Please also constrain `Fit` to 1.1 (there’s no real need for constraining fitness, it is just for making plots look nicer). Here’s what [output](doc/tasks/06_out.pdf) plot should look like if you now plot everything exactly as in the step above, except having `Fit` on y-axis.``````