Commit be757ad9 authored by Jakob Wirbel's avatar Jakob Wirbel
Browse files

update vignettes for new use of mlr3 internally.

parent db85a966
......@@ -42,22 +42,22 @@ library("ggpubr")
# Preparations
There are two different ways to access the data for our example dataset. On
the one hand, it is available through the `curatedMetagenomicsData` R package.
the one hand, it is available through the `curatedMetagenomicData` R package.
However, using it here would create many more dependencies for the `SIAMCAT`
package.
Therefore, we here use data available through the EMBL cluster.
In the `SIAMCAT` paper, we performed the presented analyses on the datasets
available through `curatedMetagenomicsData`. If you want to reproduce the
available through `curatedMetagenomicData`. If you want to reproduce the
analysis from the `SIAMCAT` paper, you can execute the code chunks in
the `curatedMetageomicsData` section, otherwise execute the code in the
the `curatedMetageomicData` section, otherwise execute the code in the
mOTUs2 section.
## curatedMetagenomicsData
First, we load the package:
```{r curateMGD, eval=FALSE}
library("curatedMetagenomciData")
library("curatedMetagenomicData")
```
### Metadata
......@@ -211,14 +211,13 @@ sc.obj <- filter.features(sc.obj, cutoff=0.05,
## Association Plot
The `check.assocation` function calculates the significance of enrichment and
metrics of association (such as generalized fold change and single-feautre
metrics of association (such as generalized fold change and single-feature
AUROC).
```{r assoc_plot, message=FALSE, warning=FALSE}
sc.obj <- check.associations(sc.obj, detect.lim = 1e-06, alpha=0.1,
max.show = 20,plot.type = 'quantile.rect',
panels = c('fc'),
fn.plot = './association_plot_nielsen.pdf')
sc.obj <- check.associations(sc.obj, log.n0 = 1e-06, alpha=0.1)
association.plot(sc.obj, fn.plot = './association_plot_nielsen.pdf',
panels = c('fc'))
```
![](./association_plot_nielsen.png)
......@@ -336,10 +335,7 @@ First, we can use `SIAMCAT` to test for associations including the Danish
samples.
```{r assoc_plot_2, warning=FALSE, message=FALSE}
sc.obj.full <- check.associations(sc.obj.full, detect.lim = 1e-06, alpha=0.1,
max.show = 20,
plot.type = 'quantile.rect',
fn.plot = './association_plot_dnk.pdf')
sc.obj.full <- check.associations(sc.obj.full, log.n0 = 1e-06, alpha=0.1)
```
Confounders can lead to biases in association testing. After using `SIAMCAT` to
......@@ -364,7 +360,7 @@ df.plot %>%
theme_classic() +
theme(panel.grid.major = element_line(colour='lightgrey'),
aspect.ratio = 1.3) +
scale_colour_manual(values=c('darkgrey', '#D41645'), guide=FALSE) +
scale_colour_manual(values=c('darkgrey', '#D41645'), guide='none') +
annotate('text', x=0.7, y=8, label='Dorea formicigenerans')
```
......
......@@ -99,8 +99,8 @@ for (d in datasets){
# create SIAMCAT object
sc.obj <- siamcat(feat=feat, meta=meta.train, label='Group', case='CD')
# test for associations
sc.obj <- check.associations(sc.obj, detect.lim = 1e-05,
feature.type = 'original',fn.plot = paste0('./assoc_plot_', d, '.pdf'))
sc.obj <- check.associations(sc.obj, log.n0=1e-05,
feature.type = 'original')
# extract the associations and save them in the assoc.list
temp <- associations(sc.obj)
temp$genus <- rownames(temp)
......@@ -303,7 +303,8 @@ bind_rows(auroc.all, test.average) %>%
mutate(split=factor(split, levels = c('none', 'Average'))) %>%
# convert to factor to enforce ordering
mutate(study.train=factor(study.train, levels=c(datasets, 'Average'))) %>%
mutate(study.test=factor(study.test, levels=c(rev(datasets),'Average'))) %>%
mutate(study.test=factor(study.test,
levels=c(rev(datasets),'Average'))) %>%
ggplot(aes(y=study.test, x=study.train, fill=AUC, size=CV, color=CV)) +
geom_tile() + theme_minimal() +
# text in tiles
......
......@@ -117,14 +117,9 @@ the significance using a non-parametric Wilcoxon test and different effect
sizes for the association (e.g. AUC or fold change).
```{r check_associations, eval=FALSE}
sc.obj <- check.associations(
sc.obj,
sort.by = 'fc',
alpha = 0.05,
mult.corr = "fdr",
detect.lim = 10 ^-6,
plot.type = "quantile.box",
panels = c("fc", "prevalence", "auroc"))
sc.obj <- check.associations(sc.obj, log.n0 = 1e-06, alpha = 0.05)
association.plot(sc.obj, sort.by = 'fc',
panels = c('fc', 'prevalence', 'auroc'))
```
The function produces a pdf file as output, since the plot is optimized for a
......@@ -143,12 +138,8 @@ in the `SIAMCAT` object, but the different analyses are visualized and saved to
a combined pdf file for qualitative interpretation.
```{r check_confounders, eval=FALSE}
sc.obj <- check.confounders(
sc.obj,
fn.plot = 'confounder_plots.pdf',
meta.in = NULL,
feature.type = 'filtered'
)
check.confounders(sc.obj, fn.plot = 'confounder_plots.pdf',
meta.in = NULL, feature.type = 'filtered')
```
The conditional entropy check primarily serves to remove nonsensical
......@@ -183,15 +174,8 @@ we use the `log.unit` method, but several other methods and customization
options are available (please check the documentation).
```{r normalize_feat}
sc.obj <- normalize.features(
sc.obj,
norm.method = "log.unit",
norm.param = list(
log.n0 = 1e-06,
n.p = 2,
norm.margin = 1
)
)
sc.obj <- normalize.features(sc.obj, norm.method = "log.unit",
norm.param = list(log.n0 = 1e-06, n.p = 2,norm.margin = 1))
```
## Prepare Cross-Validation
......@@ -204,11 +188,7 @@ scheme. The data-split will be saved in the `data_split` slot of the `SIAMCAT`
object.
```{r data_split}
sc.obj <- create.data.split(
sc.obj,
num.folds = 5,
num.resample = 2
)
sc.obj <- create.data.split(sc.obj, num.folds = 5, num.resample = 2)
```
## Model Training
......@@ -219,10 +199,7 @@ machine learning method to the measure for model selection or customizable
parameter set for hyperparameter tuning.
```{r train_model, message=FALSE, results='hide'}
sc.obj <- train.model(
sc.obj,
method = "lasso"
)
sc.obj <- train.model(sc.obj, method = "lasso")
```
The models are saved in the `model_list` slot of the `SIAMCAT` object. The
......@@ -235,7 +212,7 @@ model_type(sc.obj)
# access the models
models <- models(sc.obj)
models[[1]]
models[[1]]$model
```
## Make Predictions
......@@ -299,13 +276,8 @@ The function again produces a pdf-file optimized for a landscape DIN-A4
plotting region.
```{r eval=FALSE}
model.interpretation.plot(
sc.obj,
fn.plot = 'interpretation.pdf',
consens.thres = 0.5,
limits = c(-3, 3),
heatmap.type = 'zscore',
)
model.interpretation.plot(sc.obj, fn.plot = 'interpretation.pdf',
consens.thres = 0.5, limits = c(-3, 3), heatmap.type = 'zscore')
```
The resulting plot looks like this:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment