diff --git a/HTM_2016.bib b/HTM_2016.bib index b2673444576d37b1960acf86395c6bc584d8fce6..af7856d178e2e7604a4a9de2a315c7823dc38ede 100644 --- a/HTM_2016.bib +++ b/HTM_2016.bib @@ -11,3 +11,17 @@ title = {Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes}, journal = {Nature}, } + +@Article{Sommer_2013, + doi = {10.1093/bioinformatics/btt175}, + url = {http://dx.doi.org/10.1093/bioinformatics/btt175}, + year = {2013}, + month = {apr}, + publisher = {Oxford University Press ({OUP})}, + volume = {29}, + number = {12}, + pages = {1580--1582}, + author = {C. Sommer and M. Held and B. Fischer and W. Huber and D. W. Gerlich}, + title = {{CellH}5: a format for data exchange in high-content screening}, + journal = {Bioinformatics}, +} diff --git a/Tutorial_HTM_2016.Rmd b/Tutorial_HTM_2016.Rmd index 9ee0beb40dfb0f47dded7072ee4abb56e8aaec82..69e3c6df14a3509360cd8ffe9d54c5dcb4fdedef 100755 --- a/Tutorial_HTM_2016.Rmd +++ b/Tutorial_HTM_2016.Rmd @@ -62,7 +62,10 @@ to the one in @Neumann_2010. # Annotation import -```{r} +We first import the annotation of the plate. This consists of table that informs +us about the content of every single well on the plate. + +```{r import_annotation} data_path <- "~/p12_data" plate_map <- read.xlsx(xlsxFile = file.path(data_path, "plate_mapping.xlsx")) head(plate_map) @@ -71,10 +74,20 @@ head(plate_map) # Importing the raw data -* importing using `r Biocpkg("rhdf5")` -* possibly discuss the hdf5 format +We will now import the raw data. This data is stored in a variant of the [HDF5 format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) called +"[CellH5]"(http://www.cellh5.org/), +which defines a more restricted sub-format designed specificially to store data +from hight content screens. More information can be found in the paper by +@Sommer. + +In the code below, we use the [cellh5](https://github.com/CellH5/cellh5-R) R--package +to import the data. The file `_all_positions.ch5` contains links to the other `ch5` +files that contain the full data of the plate. +We are only interested in the predictions produced +by the machine learning algorithm, so we only extract this part of the file. -```{r readingCellH5, eval=FALSE} + +```{r readingCellH5, dependson="plate_map", eval=FALSE} path <- file.path(data_path, "_all_positions.ch5") c5f <- CellH5(path) c5_pos <- C5Positions(c5f, C5Plates(c5f)) @@ -85,7 +98,9 @@ c5_pos[["WB08_P1"]] <- NULL ``` -# Extract raw data +# Tabulate the raw data + +We now ta the importa ```{r, eval=FALSE} diff --git a/Tutorial_HTM_2016.html b/Tutorial_HTM_2016.html index 88e11df008c409e61ddef992115123b1c3ff7495..ce53f48914c546bb1d490eef5fcaa19263201c9c 100644 --- a/Tutorial_HTM_2016.html +++ b/Tutorial_HTM_2016.html @@ -89,7 +89,7 @@ document.addEventListener("DOMContentLoaded", function() { <li><a href="#about-the-tutorial"><span class="toc-section-number">2</span> About the tutorial</a></li> <li><a href="#annotation-import"><span class="toc-section-number">3</span> Annotation import</a></li> <li><a href="#importing-the-raw-data"><span class="toc-section-number">4</span> Importing the raw data</a></li> -<li><a href="#extract-raw-data"><span class="toc-section-number">5</span> Extract raw data</a></li> +<li><a href="#tabulate-the-raw-data"><span class="toc-section-number">5</span> Tabulate the raw data</a></li> <li><a href="#the-concept-of-tidy-data"><span class="toc-section-number">6</span> The concept of tidy data</a></li> <li><a href="#reshaping-the-screen-data"><span class="toc-section-number">7</span> Reshaping the screen data</a><ul> <li><a href="#plotting-in-r-ggplot2"><span class="toc-section-number">7.1</span> Plotting in R: ggplot2</a></li> @@ -126,6 +126,7 @@ rmarkdown::render('Tutorial_HTM_2016.Rmd', BiocStyle::pdf_document()) </div> <div id="annotation-import" class="section level1"> <h1><span class="header-section-number">3</span> Annotation import</h1> +<p>We first import the annotation of the plate. This consists of table that informs us about the content of every single well on the plate.</p> <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">data_path <-<span class="st"> "~/p12_data"</span> plate_map <-<span class="st"> </span><span class="kw">read.xlsx</span>(<span class="dt">xlsxFile =</span> <span class="kw">file.path</span>(data_path, <span class="st">"plate_mapping.xlsx"</span>)) <span class="kw">head</span>(plate_map)</code></pre></div> @@ -139,10 +140,8 @@ plate_map <-<span class="st"> </span><span class="kw">read.xlsx</span>(<span </div> <div id="importing-the-raw-data" class="section level1"> <h1><span class="header-section-number">4</span> Importing the raw data</h1> -<ul> -<li>importing using <em><a href="http://bioconductor.org/packages/rhdf5">rhdf5</a></em></li> -<li>possibly discuss the hdf5 format</li> -</ul> +<p>We will now import the raw data. This data is stored in a variant of the <a href="https://en.wikipedia.org/wiki/Hierarchical_Data_Format">HDF5 format</a> called “[CellH5]â€(<a href="http://www.cellh5.org/" class="uri">http://www.cellh5.org/</a>), which defines a more restricted sub-format designed specificially to store data from hight content screens. More information can be found in the paper by <span class="citation">(<span class="citeproc-not-found" data-reference-id="Sommer"><strong>???</strong></span>)</span>.</p> +<p>In the code below, we use the <a href="https://github.com/CellH5/cellh5-R">cellh5</a> R–package to import the data. The file <code>_all_positions.ch5</code> contains links to the other <code>ch5</code> files that contain the full data of the plate. We are only interested in the predictions produced by the machine learning algorithm, so we only extract this part of the file.</p> <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">path <-<span class="st"> </span><span class="kw">file.path</span>(data_path, <span class="st">"_all_positions.ch5"</span>) c5f <-<span class="st"> </span><span class="kw">CellH5</span>(path) c5_pos <-<span class="st"> </span><span class="kw">C5Positions</span>(c5f, <span class="kw">C5Plates</span>(c5f)) @@ -150,8 +149,9 @@ predictions <-<span class="st"> </span><span class="kw">C5Predictions</span>( c5_pos[[<span class="st">"WB08_P1"</span>]] <-<span class="st"> </span><span class="ot">NULL</span></code></pre></div> </div> -<div id="extract-raw-data" class="section level1"> -<h1><span class="header-section-number">5</span> Extract raw data</h1> +<div id="tabulate-the-raw-data" class="section level1"> +<h1><span class="header-section-number">5</span> Tabulate the raw data</h1> +<p>We now ta the importa</p> <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">raw_data <-<span class="st"> </span><span class="kw">sapply</span>(c5_pos, function(pos){ predictions <-<span class="st"> </span><span class="kw">C5Predictions</span>(c5f, pos, <span class="dt">mask =</span> <span class="st">"primary__primary3"</span>, <span class="dt">as =</span> <span class="st">"name"</span>) <span class="kw">table</span>(predictions)} diff --git a/get_citations_HTM_2016.R b/get_citations_HTM_2016.R index ce1dfba472ff1dabf776b3852107d580c334b366..f2343a478d716d7fab7c50119b3e5d16e9467b37 100644 --- a/get_citations_HTM_2016.R +++ b/get_citations_HTM_2016.R @@ -4,4 +4,7 @@ library(knitcitations) # Neuman et. al., 2010 citep("10.1038/nature08869") +# Sommer et. al., 2013 +citep("10.1093/bioinformatics/btt175") + write.bibtex(file = "HTM_2016.bib")