- The navigation sidebar
- The workspaces
The app screen is divided into a narrow navigation sidebar on the left and a wider workspace area on the right.
The navigation sidebar
The navigation sidebar lists available tools. Clicking on a tool in the sidebar switches the app to the corresponding workspace.
Upon starting, the app opens with the data input workspace. Other tools can only be used after uploading a data file.
This is where the data table is uploaded to the app. Because the app makes few assumptions on the table content, it is up to the user to indicate which columns contain relevant information. The data input workspace is divided into boxes corresponding to the different input required from the user:
Input data file
This is where the tabular data file is selected and uploaded to the app server. Before uploading a file, make sure that the file has a header.
This allows to select one or two columns whose values will be shown in the plot area. If two columns are selected but the plot type needs only one, the first selected column will be used. If the values are associated with plate/well information, they will be rendered as a colour gradient on the wells of the corresponding plate in the plate viewer. When only one column is selected, it will be rendered on the y-axis of the scatterplot with the index of the data points on the x-axis.
Additional variables to display on hover
By default when hovering over a point in the plot, the values of the plot variables are shown in a table below the plot. This box allows the selection of additional variables to be displayed when hovering over a data point.
Columns to hide
Hiding columns minimizes the amount of horizontal scrolling needed to reach columns on the right-hand side of the table when all columns can't fit on screen.
This allows to select one column whose values will be used to set colours for the points in the scatterplot and split the values of the plot variable for plotting one histogram per group. Only 9 distinct colours are available so any selected column with more than 9 distinct values will be ignored.
If images are associated with the data file, select the image root directory. The image root directory is the top level directory relative to which the image file paths are given in the data table. The image root directory can be an S3-compatible object store. Then select one or two columns containing the paths to the images relative to the selected root directory. If a column name contains the pattern 'image.*path', this column will be preselected in the image 1 field.
If the data table rows correspond to ROIs of associated images, select here the columns containing the coordinates of the ROIs anchor points. If the rows correspond to time points only (i.e. with no ROI definition), select only a column for frame coordinates and leave X and Y coordinates empty. If a column name contains the pattern 'coordinate_X|Y|Z|time', it will be preselected in the matching ROI coordinate field (e.g. a column named cell_coordinate_x will be preselected as column for coordinate X).
- High-throughput microscopy info High-throughput microscopy is often carried out in multiwell plates and each row of the data table is associated with a plate and a well of that plate. This is where the columns containing plate and well information are selected. When images from multiple fields of views inside a well are available, these should be identified in a separate column selected under column for fields/positions.
The information entered into the other boxes (except for the input data file) can be saved and downloaded into a configuration file in rds format. When a data file is uploaded, a browse button will appear allowing selection and upload of a previously saved configuration file. Upon upload of this file, input boxes will be populated with the saved values from the file.
Currently no attempt is made at checking the validity of an uploaded configuration file. Mismatches between the configuration file values and the column names of the uploaded data file can result in unpredictable behaviour.
The explore workspace is where the interactive data visualization happens. It is divided into 3 areas:
- A plot area on the top left of the screen. By default, this shows a scatterplot of the variables selected in the data input section. If columns for plates and wells have been selected, a plate viewer is also available. Clicking on a data point in the plot or a well in the plate viewer selects it in the data table below and opens the corresponding image(s). If the point is associated with x,y coordinates then a red dot is added to the image(s) at the position given by these coordinates.
- An image viewer area next to the plot area. This is where images selected under image 1 in the data input section will appear. Clicking on the image selects the corresponding row in the data table and highlights the corresponding point in the plot. If rows correspond to ROIs then the click position is indicated by a red dot and the data point corresponding to the closest ROI in the image is selected in both the data table and the plot. Pressing the shift key while clicking anywhere on the image enters the multiple selection mode where each subsequent shift+click is recorded and indicated by a cyan dot. Clicking anywhere on the image without pressing shift exits the multiple selection mode. When zoomed in, the keyboard arrow keys can be used to move the field of view. A list of actions available in the image viewer is available by pressing h.
- A data table area at the bottom of the screen. The data table shows the content of the uploaded data file. A tab allows switching to a second image viewer where images selected under image 2 in the data input section will appear. This second image viewer behaves like the one described above. Clicking on a table row selects it and highlights the corresponding point in the plot and in the image viewers. No image is shown when selecting multiple rows unless the corresponding objects belong to the same image. The table is searchable globally using the 'Search' box in the top right corner above the table or by column using the boxes atop each column. Searches filter the rows to be displayed in the table. To select all the rows and highlight them in the plot, click the button labeled 'Show filtered rows in plot' above the table. To deselect all selected rows, click the 'Clear selection' button. To annotate the selected rows, click the 'Annotate selection' button. This is only available if an annotation column has been chosen in the 'Annotate' section.
Individual data points, i.e. rows of the data table, can be associated with a label. Annotation starts by visiting the 'Annotate' workspace (accessible from the sidebar). There, a column to hold the annotations can be selected and labels defined. If an existing column is selected, its distinct values will be available as labels. New labels can also be added. Alternatively, a new column can be created, in which case, new labels must be provided. New labels must be entered as a comma-separated list. Once done, choices must be confirmed by clicking the 'Apply' button. Annotations can then be performed using the 'Annotate selection' button in the 'Explore' workspace.
To help visualize the overall structure of the data, several numerical variables can be combined into a 2d projection. The numerical columns and method to use can be selected in the 'Dimensionality reduction' section. Application of a dimensionality reduction method results in the creation of two new columns containing new coordinates for all data points. When running the same method multiple times, coordinates columns are re-used (i.e. new columns are not created for each new run). Upon successful completion of the dimensionality reduction, the new columns are automatically selected for plotting and the view switches back to the 'Explore' workspace.
Classification and feature selection
In this workspace, data points can be classified using the XGBoost implementation of gradient boosted decision trees. Its input are a set of numerical features and a training set consisting of rows annotated with the classes to consider in the selected target annotation column. The IDE does 5-fold cross-validation using 2/3 of the annotated data for training and 1/3 for validation. It outputs a plot of feature importance and some statistics on the classifier performance. The plot shows importance for the top 10 features and how these features cluster together. The classifier can be applied to the whole data set with the outcome put into an additional column named xgboost.predictions.