Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • meechan/platy-browser-tables
  • zinchenk/platy-browser-tables
  • tischer/platy-browser-tables
3 results
Show changes
Commits on Source (242)
Showing
with 3149 additions and 16 deletions
*.h5
__pycache__/
tmp*
.idea/*
*.DS_Store
data/dev-*
*.tif
*.swp
software
*.zip
*.n5
data/bkp
data/rawdata/evaluation
......@@ -5,24 +5,24 @@ Data and data-generation for the [platybrowser](https://github.com/embl-cba/fiji
## Data storage
Image data (only links for the image volumes) and derived data for all versions are stored in the folder `data`.
Image data and derived data for all versions are stored in the folder `data`.
We follow a versioning scheme inspired by [semantic versioning](https://semver.org/), hence all version
numbers are given as `MAJOR.MINOR.PATCH`.
- `PATCH` is increased if the derived data is update, e.g. due to corrections in some segmentation or new attributes in some table. This is usually triggered automatically (see section below).
- `MINOR` is increased if new derived data is added, e.g. a new segmentation for some structure or a new attribute table. This needs to be done manually.
- `MAJOR` is increased if new image / raw data is added, e.g. a new animal registered to the atlas or new genes. This needs to be done manually.
- `PATCH` is increased if the derived data is update, e.g. due to corrections in some segmentation or new attributes in some table. It is increased by `update_patch.py`.
- `MINOR` is increased if new derived data is added, e.g. a new segmentation for some structure or a new attribute table. It is increased by `update_minor.py`.
- `MAJOR` is increased if new image / raw data is added, e.g. a new animal registered to the atlas or new genes. It is increased by `update_major.py`.
For a given version `X.Y.Z`, the data is stored in the directory `/data/X.Y.Z/` with subfolders:
- `images`: Raw image or gene expression data. Contains bigdata-viewer xml files with absolute links to h5 files on the embl server.
- `images`: Raw image or gene expression data. Contains bigdata-viewer xml and hdf5 files. The hdf5 files are not under version control.
- `misc`: Miscellanous data.
- `segmentations`: Segmentation volumes derived from the image data. Only xml files.
- `segmentations`: Segmentation volumes derived from the image data.
- `tables`: CSV tables with attributes derived from image data and segmentations.
### File naming
Xml / hdf5 filenames must adhere to the following naming scheme, in order to clearly identify the origin of the data:
Xml / hdf5 filenames must adhere to the following naming scheme in order to clearly identify the origin of the data:
the names must be prefixed by the header `MODALITY-STAGE-ID-REGION`, where
- `MODALITY` is a shorthand for the imaging modality used to obtain the data, e.g. `sbem` for serial blockface electron microscopy.
- `STAGE` is a shorthand for the develpmental stage, e.g. `6dpf` for six day post ferilisation.
......@@ -32,7 +32,6 @@ the names must be prefixed by the header `MODALITY-STAGE-ID-REGION`, where
Currently, the data contains the three modalities
- `sbem-6dpf-1-whole`
- `prospr-6dpf-1-whole`
- `fibsem-6dpf-1-parapod`
### Table storage
......@@ -40,22 +39,109 @@ Derived attributes are stored in csv tables. Tables must be associated with a se
All tables associated with a given segmentation must be stored in the sub-directory `tables/segmentation-name`.
If this directory exists, it must at least contain the file `default.csv` with spatial attributes of the segmentation objects , which are necessary for the platybrowser table functionality.
If tables do not change between versions, they can be represented as soft-links to the old version.
If tables do not change between versions, they can be stored as soft-links to the old version.
## Data generation
## Usage
In addition to the data, the scripts for generating the derived data are also collected here.
`scripts/segmentation` contains the scripts to generate the derived segmentations with automated segmentation approaches.
The other derived data can be generated for new segmentation versions with the script `update_platy_browser.py`;
`make_initial_version.py` was used to generate the initial data in `/data/0.0.0`.
We provide three scripts to update the respective release digit:
- `update_patch.py`: Create new version folder and update derived data.
- `update_minor.py`: Create new version folder and add derived data.
- `update_major.py`: Create new version folder and add primary data.
All three scripts take the path to a json file as argument. The json needs to encode which data to update/add
according to the following specifications:
For `update_patch`, the json needs to contain a dictonary with the two keys `segmentations` and `tables`.
Each key needs to map to a list that contains (valid) names of segmentations. For names listed in `segmentations`,
the segmentation AND corresponding tables will be updated. For `tables`, only the tables will be updated.
The following example would trigger segmentation and table update for the cell segmentation and a table update for the nucleus segmentation:
```
{"segmentations": ["sbem-6dpf-1-whole-segmented-cells-labels"],
"tables": ["sbem-6dpf-1-whole-segmented-nuclei-labels"]}
```
For `update_minor`, the json needs to contain a list of dictionaries. Each dictionary corresponds to new
data to add to the platy browser. There are three valid types of data, each with different required and optional fields:
- `images`: New image data. Required fields are `source`, `name` and `input_path`. `source` refers to the primary data the image data is associated with, see [naming scheme](https://git.embl.de/tischer/platy-browser-tables#file-naming). `name` specifies the name this image data will have, excluding the naming scheme prefix. `input_path` is the path to the data to add, needs to be in bdv hdf5 format. The field `is_private` is optional. If it is `true`, the data will not be exposed in
the public big data server.
- `static segmentations`: New (static) segmentation data. The required fields are `source`, `name` and `segmentation_path` (corresponding to `input_path` in `images`). The fields `table_path_dict` and `is_private` are optional. `table_dict_path` specifies tables associated with the segmentation as a dictionary `{"table_name1": "/path/to/table1.csv", ...}`. If given, one of the table names must be `default`.
- `dynamic segmentations`: New (dynamic) segmentation data. The required fields are `source`, `name`, `paintera_project` and `resolution`. `paintera_project` specifies path and key of a n5 container storing paintera corrections for this segmentation. `resolution` is the segmentation's resolution in micrometer. The fields `table_update_function` and `is_private` are optional. `table_update_function` can be specified to register a function to generate tables for this segmentation. The function must be
importable from `scripts.attributes`.
The following example would add a new prospr gene to the images and a new static and dynamic segmentation derived from the em data:
```
[{"source": "prospr-6dpf-1-whole", "name": "new-gene-MED", "input_path": "/path/to/new-gene-data.xml"}
{"source": "sbem-6dpf-1-whole", "name": "new-static-segmentation", "segmentation_path": "/path/to/new-segmentation-data.xml",
"table_path_dict": {"default": "/path/to/default-table.csv", "custom": "/path/to/custom-table.csv"}},
{"source": "sbem-6dpf-1-whole", "name": "new-dynamic-segmentation", "paintera_project": ["/path/to/dynamic-segmentation.n5", "/paintera/project"],
"table_update_function": "new_update_function"}]
```
For `update_major`, the json needs to contain a dictionary. The dictionary keys correpond to new primary sources (cf. [naming scheme](https://git.embl.de/tischer/platy-browser-tables#file-naming))'
to add to the platy browser. Each key needs to map to a list of data entries. The specification of these entries corresponds to `update_minor`, except that the field `source` is not necessary.
The following example would add a new primary data source (FIBSEM) and add the corresponding raw data as private data:
```
{"fib-6dpf-1-whole": [{"name": "raw", "input_path": "/path/to/fib-raw.xml", "is_private": "true"}]}
```
See `example_updates/` for additional json update files.
For now, we do not add any files to version control automatically. So after calling one of the update
scripts, you must add all new files yourself and then make a release via `git tag -a X.Y.Z -m "DESCRIPTION"`.
In addition, the script `make_dev_folder.py` can be used to create a development folder. It copies the most
recent release folder into a folder prefixed with dev-`, that will not be put under version control.
The script `update_registration.py` can be used to update registered data with a new transformation.
It creates a new patch version folder and updates all relevant data.
## Installation
TODO
The data is currently hosted on the arendt EMBL share, where a conda environment with all necessary dependencies is
available. This environment is used by default.
It can be installed elsewhere using the `environment.yaml` file we provide:
```
conda env create -f environment.yaml
```
## BigDataServer
TODO
\ No newline at end of file
The platy browser can be served with [BigDataViewerServer](https://github.com/bigdataviewer/bigdataviewer-server).
On the EMBL server, you can start it from one of the version foldes misc directories:
```
cd data/X.Y.Z/misc
java -jar /g/cba/exchange/bigdataserver/bigdataviewer-server-2.1.2-jar-with-dependencies.jar -d bdv_server.txt
```
## Data generation
In addition to the data, the scripts for generating registration and producing derived data are also collected here:
### Registration
The folder `registration` contains the transformations for the different registration versions as well as the scripts
to generate / curate the registration targets. You can use the script `apply_registration.py` to apply a registration transformation
to a new input file.
<!---
- `transfer_ProSPr_data`. This folder contains the scripts needed to copy and process the ProSPr output to
'/g/arendt/EM_6dpf_segmentation/platy-browser-data/data/rawdata/prospr'.
It reads the .tif files that will be registered to the EM (e.g. MEDs, tissue manual segmentations, reference),
mirrors them in x axis, adds size information (0.55um/px), and deals with gene names.
run it in the cluster:
```
sbatch ./ProSPr_copy_and_mirror.sh
```
- `ProSPr_files_for_registration`. The three files to guide the transformation of prospr space into the EM.
--->
### Segmentation
`scripts/segmentation` contains the scripts to generate the derived segmentations with automated segmentation approaches.
`deprecated/make_initial_version.py` was used to generate the initial data in `/data/0.0.0`.
#! /g/arendt/EM_6dpf_segmentation/platy-browser-data/software/conda/miniconda3/envs/platybrowser/bin/python
import argparse
import os
import numpy as np
from mmpb import get_latest_version
from mmpb.analysis import get_region_ids, get_morphology_attribute
def cell_volumes(region_name, version):
# path are hard-coded, so we need to change the pwd to '..'
os.chdir('..')
try:
if version == '':
version = get_latest_version()
region_table_path = "data/%s/tables/sbem-6dpf-1-whole-segmented-cells-labels/regions.csv" % version
nucleus_table_path = "data/%s/tables/sbem-6dpf-1-whole-segmented-cells-labels/cells_to_nuclei.csv" % version
label_ids = get_region_ids(region_table_path, nucleus_table_path, region_name)
morpho_table_path = "data/%s/tables/sbem-6dpf-1-whole-segmented-cells-labels/morphology.csv" % version
volumes = get_morphology_attribute(morpho_table_path, "shape_volume_in_microns", query_ids=label_ids)
print("Found average volume for region", region_name, ":")
print(np.mean(volumes), "+-", np.std(volumes), "cubic micron")
except Exception as e:
os.chdir('analysis')
raise e
os.chdir('analysis')
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Compute average volume of cells for a given region.')
parser.add_argument('region_name', type=str,
help='Name of the region.')
parser.add_argument('--version', type=str, default='',
help='Version of the platy browser data. Default is latest.')
args = parser.parse_args()
cell_volumes(args.region_name, args.version)
import h5py
from pybdv import make_bdv
import pandas as pd
import numpy as np
def make_midgut_volume():
table_path = './corrected_regions_1.csv'
table = pd.read_csv(table_path, sep='\t')
label_ids = table['label_id'].values
midgut = table['midgut'].values
assert len(label_ids) == len(midgut)
assert np.array_equal(np.unique(midgut), np.array([0, 1]))
midgut_ids = label_ids[midgut == 1]
scale = 4
seg_path = '../../data/0.6.5/segmentations/sbem-6dpf-1-whole-segmented-cells-labels.h5'
key = 't00000/s00/%i/cells' % scale
res = [0.4, 0.32, 0.32]
with h5py.File(seg_path, 'r') as f:
seg = f[key][:]
midgut_seg = 255 * np.isin(seg, midgut_ids).astype('int8')
out_path = './sbem-6dpf-1-whole-segmented-midgut.h5'
factors = 3 * [[2, 2, 2]]
make_bdv(midgut_seg, out_path, factors, resolution=res, unit='micrometer',
convert_dtype=False)
if __name__ == '__main__':
make_midgut_volume()
import os
import h5py
import numpy as np
from mmpb.default_config import write_default_global_config
from mmpb.attributes.region_attributes import region_attributes
from mmpb.files.xml_utils import get_h5_path_from_xml
def recompute_table(version):
version_folder = '../../data/%s' % version
image_folder = os.path.join(version_folder, 'images')
seg_folder = os.path.join(version_folder, 'segmentations')
table_folder = os.path.join(version_folder, 'tables')
seg_path = os.path.join(seg_folder, 'sbem-6dpf-1-whole-segmented-cells-labels.xml')
seg_path = get_h5_path_from_xml(seg_path, return_absolute_path=True)
with h5py.File(seg_path, 'r') as f:
n_labels = int(f['t00000/s00/0/cells'].attrs['maxId'] + 1)
label_ids = np.arange(n_labels)
tmp_folder = './tmp_regions'
write_default_global_config(os.path.join(tmp_folder, 'configs'))
target = 'local'
max_jobs = 48
region_out = os.path.join(table_folder, 'sbem-6dpf-1-whole-segmented-cells-labels', 'regions.csv')
region_attributes(seg_path, region_out,
image_folder, seg_folder,
label_ids, tmp_folder, target, max_jobs)
if __name__ == '__main__':
recompute_table('0.6.5')
import h5py
import numpy as np
def number_of_voxels():
p = '../data/rawdata/sbem-6dpf-1-whole-raw.h5'
with h5py.File(p, 'r') as f:
ds = f['t00000/s00/0/cells']
shape = ds.shape
n_vox = np.prod(list(shape))
print("Number of voxel:")
print(n_vox)
print("corresponds to")
print(float(n_vox) / 1e12, "TVoxel")
def animal():
p = '../data/rawdata/sbem-6dpf-1-whole-mask-inside.h5'
with h5py.File(p, 'r') as f:
mask = f['t00000/s00/0/cells'][:]
bb = np.where(mask > 0)
mins = [b.min() for b in bb]
maxs = [b.max() for b in bb]
size = [ma - mi for mi, ma in zip(mins, maxs)]
print("Animal size in pixel:")
print(size)
res = [.4, .32, .32]
size = [si * re for si, re in zip(size, res)]
print("Animal size in micron:")
print(size)
if __name__ == '__main__':
number_of_voxels()
animal()
import os
import z5py
import h5py
import numpy as np
import pandas as pd
import nifty.tools as nt
from pybdv import make_bdv
from mmpb.attributes.base_attributes import base_attributes
# Segmentation version: 0.2.1
def write_assignments():
tab = pd.read_csv('./head_ganglions_table_v1.csv', sep='\t')
g = tab['Head_ganglion_1'].values.astype('uint32')
print(g.shape)
with z5py.File('ganglion.n5') as f:
f.create_dataset('assignments', data=g, compression='gzip', chunks=(10000,))
def write_ganglia():
tab = pd.read_csv('./head_ganglions_table_v1.csv', sep='\t')
g = tab['Head_ganglion_1'].values.astype('uint32')
g[0] = 0
max_id = int(g.max())
print("Max-id:", max_id)
seg_path = os.path.join('/g/arendt/EM_6dpf_segmentation/platy-browser-data/data/0.2.1',
'segmentations/sbem-6dpf-1-whole-segmented-cells-labels.h5')
seg_key = 't00000/s00/3/cells'
out_path = './sbem-6dpf-1-whole-segmented-ganglia-labels.h5'
print("Reading segmentation ...")
with h5py.File(seg_path, 'r') as f:
ds = f[seg_key]
seg = ds[:].astype('uint32')
print("To ganglion segmentation ...")
seg = nt.take(g, seg)
seg = seg.astype('int16')
print("Writing segmentation ...")
n_scales = 4
res = [.2, .16, .16]
downscale_factors = n_scales * [[2, 2, 2]]
make_bdv(seg, out_path, downscale_factors,
resolution=res, unit='micrometer')
with h5py.File(out_path) as f:
ds = f['t00000/s00/0/cells']
ds.attrs['maxId'] = max_id
def make_table():
input_path = './sbem-6dpf-1-whole-segmented-ganglia-labels.h5'
input_key = 't00000/s00/0/cells'
output_path = './default.csv'
resolution = [.2, .16, .16]
tmp_folder = './tmp_ganglia'
target = 'local'
max_jobs = 48
base_attributes(input_path, input_key, output_path, resolution,
tmp_folder, target, max_jobs, correct_anchors=False)
def make_overlap_table():
from mmpb.attributes.util import write_csv, node_labels
tmp_folder = './tmp_ganglia'
target = 'slurm'
max_jobs = 200
seg_path = os.path.join('/g/arendt/EM_6dpf_segmentation/platy-browser-data/data/0.6.5',
'segmentations/sbem-6dpf-1-whole-segmented-cells-labels.h5')
seg_key = 't00000/s00/0/cells'
input_path = './sbem-6dpf-1-whole-segmented-ganglia-labels.h5'
input_key = 't00000/s00/0/cells'
prefix = 'ganglia'
ganglia_labels = node_labels(seg_path, seg_key,
input_path, input_key, prefix,
tmp_folder, target, max_jobs)
n_labels = len(ganglia_labels)
n_ganglia = ganglia_labels.max() + 1
assert n_ganglia == 19
col_names = ['label_id', 'ganglion_id']
n_cols = len(col_names)
reg_table = os.path.join('/g/arendt/EM_6dpf_segmentation/platy-browser-data/data/0.6.5',
'tables/sbem-6dpf-1-whole-segmented-cells-labels/regions.csv')
reg_table = pd.read_csv(reg_table, sep='\t')
print(reg_table.columns)
assert len(reg_table) == len(ganglia_labels)
have_ganglia = ganglia_labels > 0
head_ids = reg_table['head'].values > 0
head_ids = np.logical_and(head_ids, ~have_ganglia)
print("Rest of head:", head_ids.sum())
ganglia_labels[head_ids] = n_ganglia
have_labels = ganglia_labels > 0
muscle_ids = reg_table['muscle'].values > 0
muscle_ids = np.logical_and(muscle_ids, have_labels)
print("Muscles in labels:", muscle_ids.sum())
ganglia_labels[muscle_ids] = 0
table = np.zeros((n_labels, n_cols))
table[:, 0] = np.arange(n_labels)
table[:, 1] = ganglia_labels
out_path = './ganglia_ids.csv'
write_csv(out_path, table, col_names)
if __name__ == '__main__':
# write_ganglia()
# make_table()
make_overlap_table()
#! /g/arendt/EM_6dpf_segmentation/platy-browser-data/software/conda/miniconda3/envs/platybrowser/bin/python
import argparse
import os
import json
import numpy as np
from mmpb import get_latest_version
from mmpb.analysis import get_cells_expressing_genes
def count_gene_expression(gene_names, threshold, version, query):
if args.query != '':
assert os.path.exists(query)
with open(query) as f:
query_ids = np.array(json.load(f))
else:
query_ids = None
# path are hard-coded, so we need to change the pwd to '..'
os.chdir('..')
try:
if version == '':
version = get_latest_version()
# TODO enable using vc assignments once we have them on master
table_path = 'data/%s/tables/sbem-6dpf-1-whole-segmented-cells-labels/genes.csv' % version
ids = get_cells_expressing_genes(table_path, threshold, gene_names)
if query_ids is not None:
ids = ids[np.isin(ids, query_ids)]
n = len(ids)
print("Found", n, "cells expressing:", ",".join(gene_names))
except Exception as e:
os.chdir('analysis')
raise e
os.chdir('analysis')
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Compute number of cells co-expressing genes.')
parser.add_argument('gene_names', type=str, nargs='+',
help='Names of the genes for which to express co-expression.')
parser.add_argument('--threshold', type=float, default=.5,
help='Threshold to count gene expression. Default is 0.5.')
parser.add_argument('--version', type=str, default='',
help='Version of the platy browser data. Default is latest.')
parser.add_argument('--query', type=str, default='',
help='Path to json with list of cell ids to restrict the query to.')
args = parser.parse_args()
count_gene_expression(args.gene_names, args.threshold, args.version, args.query)
cilia_id cell_id
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 21590
12 21590
13 0
14 0
15 21590
16 0
17 0
18 0
19 0
20 21911
21 21911
22 21910
23 21910
24 21910
25 21910
26 21910
27 0
28 21910
29 21910
30 21910
31 21910
32 21594
33
34 21594
35 21911
36 21911
37 0
38 0
39 21594
40 21590
41 21590
42 21590
43 21594
44 21594
45 21594
46 0
47
48 0
49 0
50 0
51 0
52 0
53 21594
54 21910
55 21910
56 21910
57 21910
58 21910
59 21911
60 21911
61
62 0
63 0
64 0
65 0
66 0
67 21910
68 21910
69 21910
70 21910
71 0
72 21911
73 21911
74 21911
75 0
76 21911
77 0
78
79 0
80 0
81 0
82
83
84 0
85 0
86 0
87 21915
88 21915
89 21915
90 0
91 21915
92 21915
93 21911
94 0
95 0
96
97 0
98 0
99 21915
100 21915
101 0
102 21915
103 21911
104 21915
105 21911
106 0
107 21910
108 0
109 0
110 21915
111 0
112 0
113 0
114 0
115 0
116 0
117 21910
118 21911
119 0
120 0
121 0
122 21915
123 0
124 0
125 0
126
127 0
128
129
130 0
131 0
132 0
133 0
134 21904
135 21904
136 21915
137 21915
138
139 0
140 0
141
142 0
143 0
144 0
145 0
146 0
147 0
148 0
149 22699
150 22515
151 22515
152 22182
153 22515
154 22515
155 22515
156 22515
157 22515
158 22515
159 22515
160 22515
161 22515
162 22182
163 22515
164 22515
165 22515
166 22515
167 22515
168 22515
169 21904
170
171
172 0
173 0
174 0
175 0
176 0
177 0
178 0
179 0
180 22182
181 22182
182 22182
183 22182
184 22515
185 22515
186 22515
187 22699
188 22699
189 0
190 22699
191 22699
192 0
193 0
194 22699
195 22699
196 22699
197 22699
198 22699
199 22699
200 22699
201 0
202 22699
203 22699
204 22700
205 22700
206 22700
207 22700
208 22700
209 22700
210 22700
211
212
213
214
215
216 22181
217 22181
218 22181
219 22181
220 22182
221 0
222 0
223 0
224 22515
225 22699
226 0
227 22515
228 0
229 0
230 0
231 22699
232 21904
233 0
234 0
235 22181
236 0
237 0
238 0
239 0
240 0
241 0
242 0
243 0
244 0
245 0
246 0
247 0
248 0
249 0
250 0
251 0
252 0
253 22699
254 0
255 22181
256 0
257
258
259 22182
260 0
261 0
262 0
263 0
264 0
265 0
266 0
267 0
268 0
269 0
270 0
271 0
272 22700
273 22700
274 0
275 0
276 0
277 0
278 0
279 0
280 0
281 0
282 0
283 0
284 0
285 0
286 0
287 0
288 0
289 0
290 0
291 0
292 22700
293 0
294 22700
295 22700
296 22700
297 22700
298 22700
299 22700
300 22700
301 22700
302 0
303 0
304 0
305 0
306 0
307 0
308 0
309 0
310 0
311 0
312 0
313 0
314 22584
315 22584
316 0
317 0
318 0
319 22584
320 0
321 22584
322 22925
323 22925
324 22584
325 22925
326 22925
327 0
328 0
329 0
330 22584
331 0
332 22925
333 0
334 0
335 0
336 0
337 22584
338 0
339 0
340 24024
341 0
342 0
343 0
344 0
345 24024
346 24024
347 0
348 0
349 0
350 0
351 0
352 24024
353 24024
354 0
355 0
356 0
357 0
358 0
359 0
360 0
361 0
362 0
363 0
364 0
365 0
366 0
367 0
368 24024
369 0
370 0
371 0
372 0
373 0
374 0
375 0
376 0
377 0
378 0
379 0
380 0
381 0
382 0
383 0
384 0
385 21904
386 21910
387 21910
388 21910
389 22181
390 22925
391 0
392 0
393 0
394 0
395 0
396 0
397 0
398 0
399 0
400 0
401 0
402 0
403 0
404 0
405 0
406 0
407 0
408 0
409 0
410 0
411 0
412 0
413 0
414 0
415 0
416 0
417 0
418 0
419 0
420 0
421 0
422 0
423 0
424 0
425 0
426 0
427 0
428 0
429 0
430 0
431 0
432 0
433 0
434 0
435 0
436 0
437 0
438 0
439 0
440 0
441 0
442 0
443 0
444 0
445 0
446 0
447 0
448 0
449 0
450 0
451 0
452 0
453 0
454 0
455 0
456 0
457 0
458 0
459 0
460 0
461 0
462 0
463 0
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
def get_nephr_ids(version):
table_path = '../../data/%s/tables/sbem-6dpf-1-whole-segmented-cells-labels/regions.csv' % version
table = pd.read_csv(table_path, sep='\t')
nephr_ids = table['nephridia'].values
right_nephr_ids = np.where(nephr_ids == 1)[0]
left_nephr_ids = np.where(nephr_ids == 2)[0]
return right_nephr_ids, left_nephr_ids
def check_cell_ids(version):
table_path = '../../data/%s/tables/sbem-6dpf-1-whole-segmented-cilia-labels/cell_mapping.csv' % version
right_nephr_ids, left_nephr_ids = get_nephr_ids(version)
table = pd.read_csv(table_path, sep='\t')
cell_ids = table['cell_id'].values
matched_right = []
matched_left = []
for cell_id in cell_ids:
if cell_id in right_nephr_ids:
matched_right.append(cell_id)
elif cell_id in left_nephr_ids:
matched_left.append(cell_id)
else:
assert cell_id == 0 or np.isnan(cell_id), str(cell_id)
matched_right = set(matched_right)
matched_left = set(matched_left)
# unmatched_right = set(right_nephr_ids) - matched_right
# unmatched_left = set(left_nephr_ids) - matched_left
print("Number of cells right:")
print("Total:", len(right_nephr_ids))
print("With cilia:", len(matched_right))
print("Number of cells left:")
print("Total:", len(left_nephr_ids))
print("With cilia:", len(matched_left))
def plot_cilia_per_cell(version):
counts_left = []
counts_right = []
table_path = '../../data/%s/tables/sbem-6dpf-1-whole-segmented-cilia-labels/cell_mapping.csv' % version
table = pd.read_csv(table_path, sep='\t')
cell_ids = table['cell_id']
cell_ids = cell_ids[cell_ids != 0]
cell_ids = cell_ids[~np.isnan(cell_ids)]
cell_ids = cell_ids.astype('uint32')
right_nephr_ids, left_nephr_ids = get_nephr_ids(version)
unique_cell_ids = np.unique(cell_ids)
total_count = 0
for cell_id in unique_cell_ids:
n_cilia = np.sum(cell_ids == cell_id)
total_count += n_cilia
if cell_id in right_nephr_ids:
counts_right.append(n_cilia)
elif cell_id in left_nephr_ids:
counts_left.append(n_cilia)
else:
print(cell_id)
counts_right.sort(reverse=True)
counts_left.sort(reverse=True)
assert len(counts_right) == len(counts_left)
print("Total number of cilia:", total_count)
print("Right counts:")
print(sum(counts_right))
print("Left counts:")
print(sum(counts_left))
fig, axes = plt.subplots(2)
x = np.arange(7)
ax = axes[0]
ax.set_title('Right nephridium')
ax.bar(x, height=counts_right)
ax.set_ylabel('Cilia per cell')
ax = axes[1]
ax.set_title('Left nephridium')
ax.bar(x, height=counts_left)
ax.set_ylabel('Cilia per cell')
plt.show()
if __name__ == '__main__':
version = '0.6.5'
check_cell_ids(version)
plot_cilia_per_cell(version)
import numpy as np
import pandas as pd
import elf.parallel
import vigra
from elf.io import open_file
from pybdv import make_bdv
# leaving this here to reproduce issues with elf.parallel.label
def make_nephridia_segmentation_depr():
table_path = '../../data/0.6.5/tables/sbem-6dpf-1-whole-segmented-cilia-labels/cell_mapping.csv'
seg_path = '../../data/0.6.5/segmentations/sbem-6dpf-1-whole-segmented-cells-labels.h5'
# TODO
out_path = '/g/kreshuk/pape/Work/nephr_tmp.n5'
out_key = 'test'
table = pd.read_csv(table_path, sep='\t')
cell_ids = table['cell_id'].values
cell_ids = np.unique(cell_ids)
if cell_ids[0] == 0:
cell_ids = cell_ids[1:]
print(cell_ids)
scale = 4
key = 't00000/s00/%i/cells' % scale
with open_file(seg_path, 'r') as f, open_file(out_path) as f_out:
ds = f[key]
print(ds.shape)
out = f_out.require_dataset(out_key, shape=ds.shape, chunks=ds.chunks, compression='gzip',
dtype='uint8')
print("Isin ...")
out = elf.parallel.isin(ds, cell_ids, out=out, n_threads=16, verbose=True)
print("Label ...")
# FIXME label is still not doing a completely accurate job of merging across block boundaries
out = elf.parallel.label(out, out, n_threads=16, verbose=True)
print("Compute max ...")
max_id = elf.parallel.max(out, n_threads=16, verbose=True)
print("Max component id:", max_id)
# leaving this here to reproduce issues with elf.parallel.label
def make_nephridia_segmentation():
table_path = '../../data/0.6.5/tables/sbem-6dpf-1-whole-segmented-cilia-labels/cell_mapping.csv'
seg_path = '../../data/0.6.5/segmentations/sbem-6dpf-1-whole-segmented-cells-labels.h5'
out_path = '../../data/0.6.5/segmentations/sbem-6dpf-1-whole-segmented-nephridia.xml'
table = pd.read_csv(table_path, sep='\t')
cell_ids = table['cell_id'].values
cell_ids = np.unique(cell_ids)
if cell_ids[0] == 0:
cell_ids = cell_ids[1:]
print(cell_ids)
scale = 4
key = 't00000/s00/%i/cells' % scale
with open_file(seg_path, 'r') as f:
ds = f[key]
seg = ds[:].astype('uint32')
bshape = (32, 256, 256)
tmp = np.zeros_like(seg)
print("Isin ...")
tmp = elf.parallel.isin(seg, cell_ids, out=tmp, n_threads=16, verbose=True, block_shape=bshape)
print("Label ...")
tmp = vigra.analysis.labelVolumeWithBackground(tmp)
print("Size filter ...")
ids, counts = elf.parallel.unique(tmp, return_counts=True, n_threads=16, verbose=True, block_shape=bshape)
keep_ids = np.argsort(counts)[::-1]
keep_ids = ids[keep_ids[:3]]
assert keep_ids[0] == 0
out = np.zeros(tmp.shape, dtype='uint8')
for new_id, keep_id in enumerate(keep_ids[1:], 1):
out[tmp == keep_id] = new_id
factors = 3 * [[2, 2, 2]]
res = [.4, .32, .32]
make_bdv(out, out_path, factors, resolution=res, unit='micrometer')
def append_nephridia_table():
table_path = '../../data/0.6.5/tables/sbem-6dpf-1-whole-segmented-cilia-labels/cell_mapping.csv'
table = pd.read_csv(table_path, sep='\t')
cell_ids = table['cell_id'].values
cell_ids = np.unique(cell_ids)
if cell_ids[0] == 0:
cell_ids = cell_ids[1:]
out_table_path = '../../data/0.6.5/tables/sbem-6dpf-1-whole-segmented-cells-labels/regions.csv'
seg_path = '../../data/0.6.5/segmentations/sbem-6dpf-1-whole-segmented-cells-labels.h5'
nep_path = '../../data/0.6.5/segmentations/sbem-6dpf-1-whole-segmented-nephridia.h5'
table = pd.read_csv(out_table_path, sep='\t')
new_col = np.zeros(len(table), dtype='float32')
print("Loading volumes ...")
scale = 4
key = 't00000/s00/%i/cells' % scale
with open_file(seg_path, 'r') as f:
seg = f[key][:]
scale = 0
key = 't00000/s00/%i/cells' % scale
with open_file(nep_path, 'r') as f:
nep = f[key][:]
assert nep.shape == seg.shape
print("Iterating over cells ...")
for cid in cell_ids:
nid = np.unique(nep[seg == cid])
if 0 in nid:
nid = nid[1:]
assert len(nid) == 1
new_col[cid] = nid
table['nephridia'] = new_col
table.to_csv(out_table_path, sep='\t', index=False)
if __name__ == '__main__':
# make_nephridia_segmentation()
append_nephridia_table()
import os
import numpy as np
import pandas as pd
from mmpb.analysis.nephridia import filter_by_size, match_cilia_to_cells
cell_ids = [24449, 22584, 21904, 21590, 21594, 21595, 21910, 21911, 21915]
print("Number of cells:", len(cell_ids))
tmp_path = './cilia_ids.npy'
if os.path.exists(tmp_path):
cilia_ids = np.load(tmp_path)
else:
cell_table = '../data/0.5.2/tables/sbem-6dpf-1-whole-segmented-cells-labels/default.csv'
cilia_path = '../data/0.5.2/segmentations/sbem-6dpf-1-whole-segmented-cilia-labels.h5'
cilia_key = 't00000/s00/2/cells'
cilia_res = [0.1, 0.04, 0.04]
cilia_ids = match_cilia_to_cells(cell_ids, cell_table,
cilia_path, cilia_key, cilia_res)
np.save(tmp_path, cilia_ids)
table_path = '../data/0.5.2/tables/sbem-6dpf-1-whole-segmented-cilia-labels/default.csv'
table_path2 = '../data/0.5.2/tables/sbem-6dpf-1-whole-segmented-cilia-labels/cilia.csv'
table1 = pd.read_csv(table_path, sep='\t')
table2 = pd.read_csv(table_path2, sep='\t')
table2 = table2[['length', 'diameter_mean']]
table = pd.concat([table1, table2], axis=1)
assert len(table1) == len(table2) == len(table)
table.set_index('label_id')
table = table[3:]
table = table.loc[table['length'] > 0.1]
ids = table['label_id'].values
ids = ids[np.isin(ids, cilia_ids)]
print(ids)
table = table.loc[ids]
# plot_sizes(table)
size_threshold = 5000
table = filter_by_size(table, size_threshold)
print("Number of cilia:", len(table))
lens = table["length"].values
for idd, leng in zip(ids, lens):
print(idd, leng)
print("Average len:", np.mean(lens), "+-", np.std(lens))
diameters = table["diameter_mean"].values
print("Average diameter:", np.mean(diameters), "+-", np.std(diameters))
import json
import numpy as np
import pandas as pd
import nifty.tools as nt
# the current table is w.r.t.
# cilia ids from 0.5.3 and we need 0.6.2
# cell ids from 0.3.1 and we need 0.6.1
def propagate_table():
table_path = './20191030_table_ciliaID_cellID'
table = pd.read_csv(table_path, sep='\t')
cilia_ids = table['cilia_id'].values.astype('uint32')
cell_ids = table['cell_id'].values
cell_ids[np.isinf(cell_ids)] = 0
cell_ids[np.isnan(cell_ids)] = 0
cell_ids = cell_ids.astype('uint32')
cell_id_mapping = {24024: 24723, 22925: 23531, 22700: 23296, 22699: 23295,
22584: 23199, 22515: 23132, 22182: 22827, 22181: 22826,
21915: 22549, 21911: 22546, 21910: 22545, 21904: 22541,
21594: 22214, 21590: 22211, 0: 0}
unique_vals, unique_counts = np.unique(list(cell_id_mapping.values()), return_counts=True)
print(unique_vals)
assert (unique_counts == 1).all()
cell_ids = nt.takeDict(cell_id_mapping, cell_ids)
cilia_id_mapping = '../../data/0.6.2/misc/new_id_lut_sbem-6dpf-1-whole-segmented-cilia-labels.json'
with open(cilia_id_mapping) as f:
cilia_id_mapping = json.load(f)
cilia_id_mapping = {int(k): v for k, v in cilia_id_mapping.items()}
cilia_ids = [cilia_id_mapping.get(cil_id, 0) for cil_id in cilia_ids]
cilia_ids = np.array(cilia_ids)
valid_mask = ~(cilia_ids == 0)
cilia_ids = cilia_ids[valid_mask]
cell_ids = cell_ids[valid_mask]
sorter = np.argsort(cilia_ids)
cilia_ids = cilia_ids[sorter]
cell_ids = cell_ids[sorter]
table_out = './20191030_table_ciliaID_cellID_out'
new_table = np.concatenate([cilia_ids[:, None], cell_ids[:, None]], axis=1)
new_table = pd.DataFrame(new_table, columns=['label_id', 'cell_id'])
new_table.to_csv(table_out, sep='\t', index=False)
if __name__ == '__main__':
propagate_table()
This diff is collapsed.
<SpimData version="0.2">
<BasePath type="relative">.</BasePath>
<SequenceDescription>
<ImageLoader format="bdv.hdf5">
<hdf5 type="relative">prospr-6dpf-1-whole-ache.h5</hdf5>
</ImageLoader>
<ViewSetups>
<ViewSetup>
<id>0</id>
<name>channel 1</name>
<size>550 518 570</size>
<voxelSize>
<unit>micrometer</unit>
<size>0.5 0.5 0.5</size>
</voxelSize>
<attributes>
<channel>1</channel>
</attributes>
</ViewSetup>
<Attributes name="channel">
<Channel>
<id>1</id>
<name>1</name>
</Channel>
</Attributes>
</ViewSetups>
<Timepoints type="range">
<first>0</first>
<last>0</last>
</Timepoints>
</SequenceDescription>
<ViewRegistrations>
<ViewRegistration setup="0" timepoint="0">
<ViewTransform type="affine">
<affine>0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0</affine>
</ViewTransform>
</ViewRegistration>
</ViewRegistrations>
</SpimData>
<SpimData version="0.2">
<BasePath type="relative">.</BasePath>
<SequenceDescription>
<ImageLoader format="bdv.hdf5">
<hdf5 type="relative">prospr-6dpf-1-whole-allcr1.h5</hdf5>
</ImageLoader>
<ViewSetups>
<ViewSetup>
<id>0</id>
<name>channel 1</name>
<size>550 518 570</size>
<voxelSize>
<unit>micrometer</unit>
<size>0.5 0.5 0.5</size>
</voxelSize>
<attributes>
<channel>1</channel>
</attributes>
</ViewSetup>
<Attributes name="channel">
<Channel>
<id>1</id>
<name>1</name>
</Channel>
</Attributes>
</ViewSetups>
<Timepoints type="range">
<first>0</first>
<last>0</last>
</Timepoints>
</SequenceDescription>
<ViewRegistrations>
<ViewRegistration setup="0" timepoint="0">
<ViewTransform type="affine">
<affine>0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0</affine>
</ViewTransform>
</ViewRegistration>
</ViewRegistrations>
</SpimData>
<SpimData version="0.2">
<BasePath type="relative">.</BasePath>
<SequenceDescription>
<ImageLoader format="bdv.hdf5">
<hdf5 type="relative">prospr-6dpf-1-whole-allcr2.h5</hdf5>
</ImageLoader>
<ViewSetups>
<ViewSetup>
<id>0</id>
<name>channel 1</name>
<size>550 518 570</size>
<voxelSize>
<unit>micrometer</unit>
<size>0.5 0.5 0.5</size>
</voxelSize>
<attributes>
<channel>1</channel>
</attributes>
</ViewSetup>
<Attributes name="channel">
<Channel>
<id>1</id>
<name>1</name>
</Channel>
</Attributes>
</ViewSetups>
<Timepoints type="range">
<first>0</first>
<last>0</last>
</Timepoints>
</SequenceDescription>
<ViewRegistrations>
<ViewRegistration setup="0" timepoint="0">
<ViewTransform type="affine">
<affine>0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0</affine>
</ViewTransform>
</ViewRegistration>
</ViewRegistrations>
</SpimData>
<SpimData version="0.2">
<BasePath type="relative">.</BasePath>
<SequenceDescription>
<ImageLoader format="bdv.hdf5">
<hdf5 type="relative">prospr-6dpf-1-whole-anpra.h5</hdf5>
</ImageLoader>
<ViewSetups>
<ViewSetup>
<id>0</id>
<name>channel 1</name>
<size>550 518 570</size>
<voxelSize>
<unit>micrometer</unit>
<size>0.5 0.5 0.5</size>
</voxelSize>
<attributes>
<channel>1</channel>
</attributes>
</ViewSetup>
<Attributes name="channel">
<Channel>
<id>1</id>
<name>1</name>
</Channel>
</Attributes>
</ViewSetups>
<Timepoints type="range">
<first>0</first>
<last>0</last>
</Timepoints>
</SequenceDescription>
<ViewRegistrations>
<ViewRegistration setup="0" timepoint="0">
<ViewTransform type="affine">
<affine>0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0</affine>
</ViewTransform>
</ViewRegistration>
</ViewRegistrations>
</SpimData>
<SpimData version="0.2">
<BasePath type="relative">.</BasePath>
<SequenceDescription>
<ImageLoader format="bdv.hdf5">
<hdf5 type="relative">prospr-6dpf-1-whole-anprb.h5</hdf5>
</ImageLoader>
<ViewSetups>
<ViewSetup>
<id>0</id>
<name>channel 1</name>
<size>550 518 570</size>
<voxelSize>
<unit>micrometer</unit>
<size>0.5 0.5 0.5</size>
</voxelSize>
<attributes>
<channel>1</channel>
</attributes>
</ViewSetup>
<Attributes name="channel">
<Channel>
<id>1</id>
<name>1</name>
</Channel>
</Attributes>
</ViewSetups>
<Timepoints type="range">
<first>0</first>
<last>0</last>
</Timepoints>
</SequenceDescription>
<ViewRegistrations>
<ViewRegistration setup="0" timepoint="0">
<ViewTransform type="affine">
<affine>0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0</affine>
</ViewTransform>
</ViewRegistration>
</ViewRegistrations>
</SpimData>
<SpimData version="0.2">
<BasePath type="relative">.</BasePath>
<SequenceDescription>
<ImageLoader format="bdv.hdf5">
<hdf5 type="relative">prospr-6dpf-1-whole-ap2.h5</hdf5>
</ImageLoader>
<ViewSetups>
<ViewSetup>
<id>0</id>
<name>channel 1</name>
<size>550 518 570</size>
<voxelSize>
<unit>micrometer</unit>
<size>0.5 0.5 0.5</size>
</voxelSize>
<attributes>
<channel>1</channel>
</attributes>
</ViewSetup>
<Attributes name="channel">
<Channel>
<id>1</id>
<name>1</name>
</Channel>
</Attributes>
</ViewSetups>
<Timepoints type="range">
<first>0</first>
<last>0</last>
</Timepoints>
</SequenceDescription>
<ViewRegistrations>
<ViewRegistration setup="0" timepoint="0">
<ViewTransform type="affine">
<affine>0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0</affine>
</ViewTransform>
</ViewRegistration>
</ViewRegistrations>
</SpimData>