platy-browser-data
Data and data-generation for the platybrowser.
Data storage
Image data and derived data for all versions are stored in the folder data
.
We follow a versioning scheme inspired by semantic versioning, hence all version
numbers are given as MAJOR.MINOR.PATCH
.
-
PATCH
is increased if the derived data is update, e.g. due to corrections in some segmentation or new attributes in some table. It is increased byupdate_patch.py
. -
MINOR
is increased if new derived data is added, e.g. a new segmentation for some structure or a new attribute table. It is increased byupdate_minor.py
. -
MAJOR
is increased if new image / raw data is added, e.g. a new animal registered to the atlas or new genes. It is increased byupdate_major.py
.
For a given version X.Y.Z
, the data is stored in the directory /data/X.Y.Z/
with subfolders:
-
images
: Raw image or gene expression data. Contains bigdata-viewer xml and hdf5 files. The hdf5 files are not under version control. -
misc
: Miscellanous data. -
segmentations
: Segmentation volumes derived from the image data. -
tables
: CSV tables with attributes derived from image data and segmentations.
File naming
Xml / hdf5 filenames must adhere to the following naming scheme in order to clearly identify the origin of the data:
the names must be prefixed by the header MODALITY-STAGE-ID-REGION
, where
-
MODALITY
is a shorthand for the imaging modality used to obtain the data, e.g.sbem
for serial blockface electron microscopy. -
STAGE
is a shorthand for the develpmental stage, e.g.6dpf
for six day post ferilisation. -
ID
is a number that distinguishes individual animals of a given modality and stage or distinguishes different set-ups for averaging based modalities like prospr. -
REGION
is a shorthand for the region of the animal covered by the data, e.g.parapod
for the parapodium orwhole
for the whole animal.
Currently, the data contains the three modalities
sbem-6dpf-1-whole
prospr-6dpf-1-whole
Table storage
Derived attributes are stored in csv tables. Tables must be associated with a segmentation file segmentations/segmentation-name.xml
All tables associated with a given segmentation must be stored in the sub-directory tables/segmentation-name
.
If this directory exists, it must at least contain the file default.csv
with spatial attributes of the segmentation objects , which are necessary for the platybrowser table functionality.
If tables do not change between versions, they can be stored as soft-links to the old version.
Usage
We provide three scripts to update the respective release digit:
-
update_patch.py
: Create new version folder and update derived data. -
update_minor.py
: Create new version folder and add derived data. -
update_major.py
: Create new version folder and add primary data.
All three scripts take the path to a json file as argument. The json needs to encode which data to update/add according to the following specifications:
For update_patch
, the json needs to contain a dictonary with the two keys segmentations
and tables
.
Each key needs to map to a list that contains (valid) names of segmentations. For names listed in segmentations
,
the segmentation AND corresponding tables will be updated. For tables
, only the tables will be updated.
The following example would trigger segmentation and table update for the cell segmentation and a table update for the nucleus segmentation:
{"segmentations": ["sbem-6dpf-1-whole-segmented-cells-labels"],
"tables": ["sbem-6dpf-1-whole-segmented-nuclei-labels"]}
For update_minor
, the json needs to contain a list of dictionaries. Each dictionary corresponds to new
data to add to the platy browser. There are three valid types of data, each with different required and optional fields:
-
images
: New image data. Required fields aresource
,name
andinput_path
.source
refers to the primary data the image data is associated with, see naming scheme.name
specifies the name this image data will have, excluding the naming scheme prefix.input_path
is the path to the data to add, needs to be in bdv hdf5 format. The fieldis_private
is optional. If it istrue
, the data will not be exposed in the public big data server. -
static segmentations
: New (static) segmentation data. The required fields aresource
,name
andsegmentation_path
(corresponding toinput_path
inimages
). The fieldstable_path_dict
andis_private
are optional.table_dict_path
specifies tables associated with the segmentation as a dictionary{"table_name1": "/path/to/table1.csv", ...}
. If given, one of the table names must bedefault
. -
dynamic segmentations
: New (dynamic) segmentation data. The required fields aresource
,name
,paintera_project
andresolution
.paintera_project
specifies path and key of a n5 container storing paintera corrections for this segmentation.resolution
is the segmentation's resolution in micrometer. The fieldstable_update_function
andis_private
are optional.table_update_function
can be specified to register a function to generate tables for this segmentation. The function must be importable fromscripts.attributes
. The following example would add a new prospr gene to the images and a new static and dynamic segmentation derived from the em data:
[{"source": "prospr-6dpf-1-whole", "name": "new-gene-MED", "input_path": "/path/to/new-gene-data.xml"}
{"source": "sbem-6dpf-1-whole", "name": "new-static-segmentation", "segmentation_path": "/path/to/new-segmentation-data.xml",
"table_path_dict": {"default": "/path/to/default-table.csv", "custom": "/path/to/custom-table.csv"}},
{"source": "sbem-6dpf-1-whole", "name": "new-dynamic-segmentation", "paintera_project": ["/path/to/dynamic-segmentation.n5", "/paintera/project"],
"table_update_function": "new_update_function"}]
For update_major
, the json needs to contain a dictionary. The dictionary keys correpond to new primary sources (cf. naming scheme)'
to add to the platy browser. Each key needs to map to a list of data entries. The specification of these entries corresponds to update_minor
, except that the field source
is not necessary.
The following example would add a new primary data source (FIBSEM) and add the corresponding raw data as private data:
{"fib-6dpf-1-whole": [{"name": "raw", "input_path": "/path/to/fib-raw.xml", "is_private": "true"}]}
See example_updates/
for additional json update files.
For now, we do not add any files to version control automatically. So after calling one of the update
scripts, you must add all new files yourself and then make a release via git tag -a X.Y.Z -m "DESCRIPTION"
.
In addition, the script make_dev_folder.py
can be used to create a development folder. It copies the most
recent release folder into a folder prefixed with dev-`, that will not be put under version control.
Installation
The data is currently hosted on the arendt EMBL share, where a conda environment with all necessary dependencies is available. This environment is used by default.
It can be installed elsewhere using the environment.yaml
file we provide:
conda env create -f environment.yaml
BigDataServer
The platy browser can be served with BigDataViewerServer. On the EMBL server, you can start it from one of the version foldes misc directories:
cd data/X.Y.Z/misc
java -jar /g/cba/exchange/bigdataserver/bigdataviewer-server-2.1.2-jar-with-dependencies.jar -d bdv_server.txt
Data generation
In addition to the data, the scripts for generating registration and producing derived data are also collected here:
Registration
The folder registration
contains:
-
transfer_ProSPr_data
. This folder contains the scripts needed to copy and process the ProSPr output to '/g/arendt/EM_6dpf_segmentation/platy-browser-data/data/rawdata/prospr'. It reads the .tif files that will be registered to the EM (e.g. MEDs, tissue manual segmentations, reference), mirrors them in x axis, adds size information (0.55um/px), and deals with gene names. run it in the cluster:
sbatch ./ProSPr_copy_and_mirror.sh
-
ProSPr_files_for_registration
. The three files to guide the transformation of prospr space into the EM.
Segmentation
scripts/segmentation
contains the scripts to generate the derived segmentations with automated segmentation approaches.
deprecated/make_initial_version.py
was used to generate the initial data in /data/0.0.0
.