Commit 6628ce52 authored by Jean-Karim Heriche's avatar Jean-Karim Heriche

Shorten README. Content moved to wiki.

parent d0f39b20
## Image Data Explorer
### Introduction
Most bioimaging projects derive data from images and regions of interest (ROIs, e.g. segmented objects) on those images. It is desirable, and sometimes necessary, to explore these image-derived data while visualizing the image(s) and ROIs associated with each data point. To address this need, we developed the Image Data Explorer (IDE). The IDE is implemented as a Shiny app (i.e. a web app written in R using the Shiny package).
### Data structure requirements
#### Images
Each image file must contain an image of at most 3 dimensions with the third dimension representing either depth (z coordinate) or time. Images are expected to be organised under one common root directory. For example, if the images are organized like this:
  ▽ screen\_images
      ▽ plate1\_replicate1
          ▽ well001
              W001-P001-Z000-T0000-s1234-Cy3.tif
              W001-P001-Z000-T0000-s1234-EGFP.tif
          ▷ well002
      ▷ plate1\_replicate2
then the image root directory is screen\_images.
The IDE can in principle read all image formats supported by BioFormats but has so far only been tested with TIFF, PNG and JPEG.
#### Data points
##### Format
Image-derived data are expected to be in table format with data points in rows and stored in a tab- or comma-separated text file using ASCII or UTF-8 encoding. The table must have column headers with unique column names and all columns must have a header. For numbers, the decimal separator must be . (dot) and no separator for thousands is allowed.
##### Content
To link data points to images, the table should include one column with the path to image files relative to the image root directory and each cell of this column must reference only one file. Using the example above, the image root directory is 'screen\_images' and therefore the table column for data points associated with image W001-P001-Z000-T0000-s1234-Cy3.tif should contain the relative path 'plate1_replicate1/well001/W001-P001-Z000-T0000-s1234-Cy3.tif'. There can be multiple columns with links to images but only two can be used simultaneously in the IDE.
The IDE references ROIs by the coordinates of an anchor point (e.g. the ROI centre) therefore there should be a column for each of the relevant coordinates: x, y and either z or t. Coordinates (x,y) must be in pixels relative to the top left corner of the image (which is pretty much the standard for image analysis software).
### Quick start
Before you start, make sure that your data conforms to the requirements described in the [wiki section on preparing data for use with the IDE](https://git.embl.de/meechan/image-data-explorer/-/wikis/Preparing-the-data-for-use-with-the-IDE).
### Using the IDE
#### Requirements
The IDE needs a computer with an [R environment version >=3.5.0](https://www.r-project.org/). The IDE requires some R packages and will try to install them if it can't detect them on the system. In case of problems, it may be better to install the required packages manually.
The IDE requires the following packages from CRAN:
DT, shiny, shinyFiles, shinycssloaders, shinydashboard, shinyjs, shinyWidgets, shinybusy, assertthat, ggplot2, plotly, RANN, MASS, uwot.
From within an R console, install with:
```
> install.packages("DT", "shiny", "shinyFiles", "shinycssloaders", "shinydashboard", "shinyjs", "shinyWidgets", "shinybusy", "assertthat", "ggplot2", "plotly", "RANN", "MASS", "uwot")
```
and from Bioconductor: RBioFormats and EBImage
From within an R console, install with:
```
> install.packages("BiocManager")
> BiocManager::install("aoles/RBioFormats")
> BiocManager::install("EBImage")
```
The IDE needs a computer with an [R environment version >=3.5.0](https://www.r-project.org/). The IDE requires some R packages and will try to install them if it can't detect them on the system. For more details, check the [wiki section on installation](https://git.embl.de/meechan/image-data-explorer/-/wikis/Installation).
#### Installation
Download the code from the project's repository and run it from within the project directory:
```
> git clone git@git.embl.de:meechan/image-data-explorer.git
......@@ -49,51 +16,12 @@ Download the code from the project's repository and run it from within the proje
The app is then accessible from a web browser at http://127.0.0.1:5476
Alternatively, open the file image_data_explorer.R with [RStudio](https://www.rstudio.com/products/rstudio/download/) then click the 'Run App' button.
#### Building and running as a container
Download and install [Docker](https://docs.docker.com/get-docker/). On Windows, make sure that Docker Desktop uses the Windows Subsystem for Linux 2 (WSL2). Then clone this repository.
From a terminal:
- Navigate to the directory containing the Dockerfile
- Build the container (be patient, this may take a while) with
```
> sudo docker build -t image-data-explorer .
```
- Run the app in the container with:
```
> sudo docker run --rm -p 5476:5476 -v /user/home/dir:/data image-data-explorer
```
Replace /user/home/dir with the path to your user's home directory or any other directory under which the image root directory resides. This is necessary to make the image root directory accessible to the app.
The app can then be accessed from a web browser at http://127.0.0.1:5476
#### Data input
The IDE opens with the 'Input data' section with several boxes:
* **Input data file**
This is where the tabular data is selected and uploaded to the app server. Before uploading a file, make sure that the file has a header and the correct column separator and quote type are selected.
* **Plot variables**
This allows to select one or two columns whose values will be shown in a scatterplot.
* **Additional variables to display on hover**
By default when hovering over a point in the plot, the values of the plot variables are shown in a table below the plot. This box allows the selection of additional variables to be displayed when hovering over a point.
* **Columns to hide**
Hiding columns minimizes the amount of horizontal scrolling needed to reach columns on the right-hand side of the table when all columns can't fit on screen.
* **Groups**
This allows to select one column whose values will be used to set colours for the points in the plot. Only 9 distinct colours are available so any selected column with more than 9 distinct values will be ignored.
* **Images**
If images are associated with the data file, select the image root directory then one or two columns containing the paths to the images relative to the selected root directory. If a column name contains the pattern 'image.*path', this column will be preselected in the image 1 field.
* **ROIs**
If the data table rows correspond to ROIs of associated images, select here the columns containing the coordinates of the ROIs centres.
If the rows correspond to time points only (i.e. with no ROI definition), select only a column for frame coordinates and leave X and Y coordinates empty. If a column name contains the pattern 'coordinate_X|Y|Z|time', it will be preselected in the matching ROI coordinate field (e.g. a column named cell_coordinate_x will be preselected as column for coordinate X).
#### Data exploration and annotation
Once the data input section has been completed, data exploration is reached by clicking 'Explore' in the sidebar.
This section is formed of three parts:
* **A plot** area on the top left of the screen. This shows a scatterplot of the variables selected in the data input section.
Clicking on a data point in the plot selects it in the data table below and opens the corresponding image(s). If the point is associated with x,y coordinates then a red dot is added to the image(s) at the position given by these coordinates.
* **An image viewer** area next to the plot area. This is where images selected under image 1 in the data input section will appear.
Clicking on the image selects the corresponding row in the data table and highlights the corresponding point in the plot. If rows correspond to ROIs then the click position is indicated by a red dot and the data point corresponding to the closest ROI in the image is selected in both the data table and the plot.
* **A data table** area at the bottom of the screen. The data table shows the content of the uploaded data file. A tab allows switching to a second image viewer where images selected under image 2 in the data input section will appear. This second image viewer behaves like the one described above. Clicking on a table row selects it and highlights the corresponding point in the plot. No image is shown when selecting multiple rows. The table is searchable globally using the 'Search' box in the top right corner above the table or by column using the boxes atop each column. Searches filter the rows to be displayed in the table. To select all the rows and highlight them in the plot, click the button labeled 'Show filtered rows in plot' above the table. To deselect all selected rows, click the 'Clear selection' button. To annotate the selected rows, click the 'Annotate selection' button. This is only available if an annotation column has been chosen in the 'Annotate' section.
#### Data annotation
Individual data points, i.e. rows of the data table, can be associated with a label. Annotation starts by visiting the 'Annotate' section (accessible from the sidebar). There, a column to hold the annotations can be selected and labels defined. If an existing column is selected, its distinct values will be available as labels. New labels can also be added. Alternatively, a new column can be created, in which case, new labels must be provided. New labels must be entered as a comma-separated list. Once done, choices must be confirmed by clicking the 'Apply' button. Annotations can then be performed using the 'Annotate selection' button in the 'Explore' section.
#### Dimensionality reduction
To help visualize the overall structure of the data, several numerical variables can be combined into a 2d projection. The numerical columns and method to use can be selected in the 'Dimensionality reduction' section. Application of a dimensionality reduction method results in the creation of two new columns containing new coordinates for all data points. When running the same method multiple times, coordinates columns are re-used (i.e. new columns are not created for each new run). Upon successful completion of the dimensionality reduction, the new columns are automatically selected for plotting and the view switches to the 'Explore' section.
#### Screenshot
The app can also be [run in a container](https://git.embl.de/meechan/image-data-explorer/-/wikis/Installation#building-and-running-the-app-in-a-container).
More details can be found in the [wiki](https://git.embl.de/meechan/image-data-explorer/-/wikis/home#-user-manual).
### Screenshot
<img src="screenshots/image_data_explorer_v0.3.screenshot.png"/>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment