Skip to content
Snippets Groups Projects

platy-browser-data

Data and data-generation for the platybrowser.

Data storage

Image data and derived data for all versions are stored in the folder data. We follow a versioning scheme inspired by semantic versioning, hence all version numbers are given as MAJOR.MINOR.PATCH.

  • PATCH is increased if the derived data is update, e.g. due to corrections in some segmentation or new attributes in some table. It is increased by update_patch.py.
  • MINOR is increased if new derived data is added, e.g. a new segmentation for some structure or a new attribute table. It is increased by update_minor.py.
  • MAJOR is increased if new image / raw data is added, e.g. a new animal registered to the atlas or new genes. It is increased by update_major.py.

For a given version X.Y.Z, the data is stored in the directory /data/X.Y.Z/ with subfolders:

  • images: Raw image or gene expression data. Contains bigdata-viewer xml and hdf5 files. The hdf5 files are not under version control.
  • misc: Miscellanous data.
  • segmentations: Segmentation volumes derived from the image data.
  • tables: CSV tables with attributes derived from image data and segmentations.

File naming

Xml / hdf5 filenames must adhere to the following naming scheme in order to clearly identify the origin of the data: the names must be prefixed by the header MODALITY-STAGE-ID-REGION, where

  • MODALITY is a shorthand for the imaging modality used to obtain the data, e.g. sbem for serial blockface electron microscopy.
  • STAGE is a shorthand for the develpmental stage, e.g. 6dpf for six day post ferilisation.
  • ID is a number that distinguishes individual animals of a given modality and stage or distinguishes different set-ups for averaging based modalities like prospr.
  • REGION is a shorthand for the region of the animal covered by the data, e.g. parapod for the parapodium or whole for the whole animal.

Currently, the data contains the three modalities

  • sbem-6dpf-1-whole
  • prospr-6dpf-1-whole

Table storage

Derived attributes are stored in csv tables. Tables must be associated with a segmentation file segmentations/segmentation-name.xml All tables associated with a given segmentation must be stored in the sub-directory tables/segmentation-name. If this directory exists, it must at least contain the file default.csv with spatial attributes of the segmentation objects , which are necessary for the platybrowser table functionality.

If tables do not change between versions, they can be stored as soft-links to the old version.

Usage

We provide three scripts to update the respective release digit:

  • update_patch.py: Create new version folder and update derived data.
  • update_minor.py: Create new version folder and add derived data.
  • update_major.py: Create new version folder and add primary data.

All three scripts take the path to a json file as argument. The json needs to encode which data to update/add according to the following specifications:

For update_patch, the json needs to contain a dictonary with the two keys segmentations and tables. Each key needs to map to a list that contains (valid) names of segmentations. For names listed in segmentations, the segmentation AND corresponding tables will be updated. For tables, only the tables will be updated. The following example would trigger segmentation and table update for the cell segmentation and a table update for the nucleus segmentation:

{"segmentations": ["sbem-6dpf-1-whole-segmented-cells-labels"],
 "tables": ["sbem-6dpf-1-whole-segmented-nuclei-labels"]}

For update_minor, the json needs to contain a list of dictionaries. Each dictionary corresponds to new data to add to the platy browser. There are three valid types of data, each with different required and optional fields:

  • images: New image data. Required fields are source, name and input_path. source refers to the primary data the image data is associated with, see naming scheme. name specifies the name this image data will have, excluding the naming scheme prefix. input_path is the path to the data to add, needs to be in bdv hdf5 format. The field is_private is optional. If it is true, the data will not be exposed in the public big data server.
  • static segmentations: New (static) segmentation data. The required fields are source, name and segmentation_path (corresponding to input_path in images). The fields table_path_dict and is_private are optional. table_dict_path specifies tables associated with the segmentation as a dictionary {"table_name1": "/path/to/table1.csv", ...}. If given, one of the table names must be default.
  • dynamic segmentations: New (dynamic) segmentation data. The required fields are source, name, paintera_project and resolution. paintera_project specifies path and key of a n5 container storing paintera corrections for this segmentation. resolution is the segmentation's resolution in micrometer. The fields table_update_function and is_private are optional. table_update_function can be specified to register a function to generate tables for this segmentation. The function must be importable from scripts.attributes. The following example would add a new prospr gene to the images and a new static and dynamic segmentation derived from the em data:
[{"source": "prospr-6dpf-1-whole", "name": "new-gene-MED", "input_path": "/path/to/new-gene-data.xml"}
 {"source": "sbem-6dpf-1-whole", "name": "new-static-segmentation", "segmentation_path": "/path/to/new-segmentation-data.xml",
  "table_path_dict": {"default": "/path/to/default-table.csv", "custom": "/path/to/custom-table.csv"}},
 {"source": "sbem-6dpf-1-whole", "name": "new-dynamic-segmentation", "paintera_project": ["/path/to/dynamic-segmentation.n5", "/paintera/project"],
  "table_update_function": "new_update_function"}]

For update_major, the json needs to contain a dictionary. The dictionary keys correpond to new primary sources (cf. naming scheme)' to add to the platy browser. Each key needs to map to a list of data entries. The specification of these entries corresponds to update_minor, except that the field source is not necessary. The following example would add a new primary data source (FIBSEM) and add the corresponding raw data as private data:

{"fib-6dpf-1-whole": [{"name": "raw", "input_path": "/path/to/fib-raw.xml", "is_private": "true"}]}

See example_updates/ for additional json update files.

For now, we do not add any files to version control automatically. So after calling one of the update scripts, you must add all new files yourself and then make a release via git tag -a X.Y.Z -m "DESCRIPTION".

In addition, the script make_dev_folder.py can be used to create a development folder. It copies the most recent release folder into a folder prefixed with dev-`, that will not be put under version control.

Installation

The data is currently hosted on the arendt EMBL share, where a conda environment with all necessary dependencies is available. This environment is used by default.

It can be installed elsewhere using the environment.yaml file we provide:

conda env create -f environment.yaml

BigDataServer

The platy browser can be served with BigDataViewerServer. On the EMBL server, you can start it from one of the version foldes misc directories:

cd data/X.Y.Z/misc
java -jar /g/cba/exchange/bigdataserver/bigdataviewer-server-2.1.2-jar-with-dependencies.jar -d bdv_server.txt

Data generation

In addition to the data, the scripts for generating registration and producing derived data are also collected here:

Registration

The folder registration contains:

  • transfer_ProSPr_data. This folder contains the scripts needed to copy and process the ProSPr output to '/g/arendt/EM_6dpf_segmentation/platy-browser-data/data/rawdata/prospr'. It reads the .tif files that will be registered to the EM (e.g. MEDs, tissue manual segmentations, reference), mirrors them in x axis, adds size information (0.55um/px), and deals with gene names. run it in the cluster:
sbatch ./ProSPr_copy_and_mirror.sh
  • ProSPr_files_for_registration. The three files to guide the transformation of prospr space into the EM.

Segmentation

scripts/segmentation contains the scripts to generate the derived segmentations with automated segmentation approaches. deprecated/make_initial_version.py was used to generate the initial data in /data/0.0.0.