expanded section on data import from ch5 files

cf97a873 · Bernd Klaus · c8735773 · cf97a873 · cf97a873 · cf97a873
Commit cf97a873 authored 8 years ago by Bernd Klaus
--- a/HTM_2016.bib
+++ b/HTM_2016.bib
@@ -11,3 +11,17 @@
  title = {Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes},
  journal = {Nature},
 }
+
+@Article{Sommer_2013,
+  doi = {10.1093/bioinformatics/btt175},
+  url = {http://dx.doi.org/10.1093/bioinformatics/btt175},
+  year = {2013},
+  month = {apr},
+  publisher = {Oxford University Press ({OUP})},
+  volume = {29},
+  number = {12},
+  pages = {1580--1582},
+  author = {C. Sommer and M. Held and B. Fischer and W. Huber and D. W. Gerlich},
+  title = {{CellH}5: a format for data exchange in high-content screening},
+  journal = {Bioinformatics},
+}
--- a/Tutorial_HTM_2016.Rmd
+++ b/Tutorial_HTM_2016.Rmd
@@ -62,7 +62,10 @@ to the one in @Neumann_2010.

 # Annotation import

-```{r}
+We first import the annotation of the plate. This consists of table that informs
+us about the content of every single well on the plate.
+
+```{r import_annotation}
 data_path <- "~/p12_data"
 plate_map <- read.xlsx(xlsxFile = file.path(data_path, "plate_mapping.xlsx"))
 head(plate_map)
@@ -71,10 +74,20 @@ head(plate_map)

 # Importing the raw data

-* importing using `r Biocpkg("rhdf5")`
-* possibly discuss the hdf5 format
+We will now import the raw data. This data is stored in a variant of the [HDF5 format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) called
+"[CellH5]"(http://www.cellh5.org/),
+which defines a more restricted sub-format designed specificially to store data
+from hight content screens. More information can be found in the paper by 
+@Sommer.
+
+In the code below, we use the [cellh5](https://github.com/CellH5/cellh5-R) R--package 
+to import the data. The file `_all_positions.ch5` contains links to the other `ch5`
+files that contain the full data of the plate. 
+We are only interested in the predictions produced 
+by the machine learning algorithm, so we only extract this part of the file.

-```{r readingCellH5, eval=FALSE}
+
+```{r readingCellH5, dependson="plate_map", eval=FALSE}
 path <- file.path(data_path, "_all_positions.ch5")
 c5f <- CellH5(path)
 c5_pos <- C5Positions(c5f, C5Plates(c5f))
@@ -85,7 +98,9 @@ c5_pos[["WB08_P1"]] <- NULL
 ```


-# Extract raw data
+# Tabulate the raw data
+
+We now ta the importa

 ```{r, eval=FALSE}


--- a/Tutorial_HTM_2016.html
+++ b/Tutorial_HTM_2016.html
@@ -89,7 +89,7 @@ document.addEventListener("DOMContentLoaded", function() {
 <li><a href="#about-the-tutorial"><span class="toc-section-number">2</span> About the tutorial</a></li>
 <li><a href="#annotation-import"><span class="toc-section-number">3</span> Annotation import</a></li>
 <li><a href="#importing-the-raw-data"><span class="toc-section-number">4</span> Importing the raw data</a></li>
-<li><a href="#extract-raw-data"><span class="toc-section-number">5</span> Extract raw data</a></li>
+<li><a href="#tabulate-the-raw-data"><span class="toc-section-number">5</span> Tabulate the raw data</a></li>
 <li><a href="#the-concept-of-tidy-data"><span class="toc-section-number">6</span> The concept of tidy data</a></li>
 <li><a href="#reshaping-the-screen-data"><span class="toc-section-number">7</span> Reshaping the screen data</a><ul>
 <li><a href="#plotting-in-r-ggplot2"><span class="toc-section-number">7.1</span> Plotting in R: ggplot2</a></li>
@@ -126,6 +126,7 @@ rmarkdown::render('Tutorial_HTM_2016.Rmd', BiocStyle::pdf_document())
 </div>
 <div id="annotation-import" class="section level1">
 <h1><span class="header-section-number">3</span> Annotation import</h1>
+<p>We first import the annotation of the plate. This consists of table that informs us about the content of every single well on the plate.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">data_path &lt;-<span class="st"> &quot;~/p12_data&quot;</span>
 plate_map &lt;-<span class="st"> </span><span class="kw">read.xlsx</span>(<span class="dt">xlsxFile =</span> <span class="kw">file.path</span>(data_path, <span class="st">&quot;plate_mapping.xlsx&quot;</span>))
 <span class="kw">head</span>(plate_map)</code></pre></div>
@@ -139,10 +140,8 @@ plate_map &lt;-<span class="st"> </span><span class="kw">read.xlsx</span>(<span
 </div>
 <div id="importing-the-raw-data" class="section level1">
 <h1><span class="header-section-number">4</span> Importing the raw data</h1>
-<ul>
-<li>importing using <em><a href="http://bioconductor.org/packages/rhdf5">rhdf5</a></em></li>
-<li>possibly discuss the hdf5 format</li>
-</ul>
+<p>We will now import the raw data. This data is stored in a variant of the <a href="https://en.wikipedia.org/wiki/Hierarchical_Data_Format">HDF5 format</a> called “[CellH5]”(<a href="http://www.cellh5.org/" class="uri">http://www.cellh5.org/</a>), which defines a more restricted sub-format designed specificially to store data from hight content screens. More information can be found in the paper by <span class="citation">(<span class="citeproc-not-found" data-reference-id="Sommer"><strong>???</strong></span>)</span>.</p>
+<p>In the code below, we use the <a href="https://github.com/CellH5/cellh5-R">cellh5</a> R–package to import the data. The file <code>_all_positions.ch5</code> contains links to the other <code>ch5</code> files that contain the full data of the plate. We are only interested in the predictions produced by the machine learning algorithm, so we only extract this part of the file.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">path &lt;-<span class="st"> </span><span class="kw">file.path</span>(data_path, <span class="st">&quot;_all_positions.ch5&quot;</span>)
 c5f &lt;-<span class="st"> </span><span class="kw">CellH5</span>(path)
 c5_pos &lt;-<span class="st"> </span><span class="kw">C5Positions</span>(c5f, <span class="kw">C5Plates</span>(c5f))
@@ -150,8 +149,9 @@ predictions &lt;-<span class="st"> </span><span class="kw">C5Predictions</span>(

 c5_pos[[<span class="st">&quot;WB08_P1&quot;</span>]] &lt;-<span class="st"> </span><span class="ot">NULL</span></code></pre></div>
 </div>
-<div id="extract-raw-data" class="section level1">
-<h1><span class="header-section-number">5</span> Extract raw data</h1>
+<div id="tabulate-the-raw-data" class="section level1">
+<h1><span class="header-section-number">5</span> Tabulate the raw data</h1>
+<p>We now ta the importa</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">raw_data &lt;-<span class="st"> </span><span class="kw">sapply</span>(c5_pos, function(pos){
                predictions &lt;-<span class="st"> </span><span class="kw">C5Predictions</span>(c5f, pos, <span class="dt">mask =</span> <span class="st">&quot;primary__primary3&quot;</span>, <span class="dt">as =</span> <span class="st">&quot;name&quot;</span>)
                <span class="kw">table</span>(predictions)}

--- a/get_citations_HTM_2016.R
+++ b/get_citations_HTM_2016.R
@@ -4,4 +4,7 @@ library(knitcitations)
 # Neuman et. al., 2010
 citep("10.1038/nature08869")

+# Sommer et. al., 2013
+citep("10.1093/bioinformatics/btt175")
+
 write.bibtex(file = "HTM_2016.bib")