CartoCell, a high-throughput pipeline for accurate 3D image analysis¶
This tutorial describes how to train and infer using our custom ResU-Net 3D DNN in order to reproduce the results obtained in
(Andrés-San Román, 2022). Given an initial training dataset of 21 segmented epithelial 3D cysts acquired after confocal microscopy, we follow the CartoCell pipeline (figure below) to high-throughput segment hundreds of cysts at low resolution automatically.
CartoCell, a high-throughput pipeline for accurate 3D image analysis, unveils cell morphology patterns in epithelial cysts. Jesús Andrés-San Román, Carmen Gordillo-Vázquez, Daniel Franco-Barranco, Laura Morato, Antonio Tagua, Pablo Vicente-Munuera, Ana M. Palacios, María P. Gavilán, Valentina Annese, Pedro Gómez-Gálvez, Ignacio Arganda-Carreras, Luis M. Escudero. [under revision]
In Phase 1, a small dataset of 21 cysts, stained with cell outlines markers, was acquired at high-resolution in a confocal microscope. Next, the individual cell instances were segmented. The high-resolution images from Phase 1 provides the accurate and realistic set of data necessary for the following steps.
In Phase 2, both high-resolution raw and label images were down-sampled to create our initial training dataset. Specifically, image volumes were reduced to match the resolution of the images acquired in Phase 3. Using that dataset, a first DNN was trained. We will refer to this first model as model M1.
In Phase 3, a large number of low-resolution stacks of multiple epithelial cysts was acquired. This was a key step to allow the high-throughput analysis of samples since it greatly reduces acquisition time. Here, we extracted the single-layer and single-lumen cysts by cropping them from the complete stack. This way, we obtained a set of 293 low-resolution images, composed of 84 cysts at 4 days, 113 cysts at 7 days and 96 cysts at 10 days. Next, we applied our trained model M1 to those images and post-processed their output to produce (i) a prediction of individual cell instances (obtained by marker-controlled watershed), and (ii) a prediction of the mask of the full cellular regions. At this stage, the output cell instances were generally not touching each other, which is a problem to study cell connectivity in epithelia. Therefore, we applied a 3D Voronoi algorithm to correctly mimic the epithelial packing. More specifically, each prediction of cell instances was used as a Voronoi seed, while the prediction of the mask of the cellular region defined the bounding territory that each cell could occupy. The result of this phase was a large dataset of low-resolution images and their corresponding accurate labels.
In Phase 4, a new 3D ResU-Net model (model M2, from now on) was trained on the newly produced large dataset of low-resolution images and its paired label images. This was a crucial step, since the performance of deep learning models is highly dependent on the amount of training samples.
In Phase 5, model M2 was applied to new low-resolution cysts and their output was post-processed as in Phase 3, thus achieving high-throughput segmentation of the desired cysts.
The data needed is:
low-resolution_raw_images to run Phase 3 – 5 of CartoCell pipeline.
We also provide all the properly segmented cysts (ground truth) in Mendeley.
How to train your model¶
You have two options to train your model: via command line or using Google Colab.
You can reproduce the exact results of our manuscript via command line using cartocell_training.yaml configuration file.
In case you want to reproduce our model M1, Phase 2, you will need to modify the
TRAIN.GT_PATHwith the paths of training_down-sampled_raw_images and training_down-sampled_label_images respectively.
In case you want to reproduce our model M2, Phase 4, you need to merge training_down-sampled_raw_images and Phase 3 (model M1) output in a folder and set its path in
TRAIN.PATH. In the same way you need to merge training_down-sampled_label_images and low-resolution_label_images images in a folder and set its path in
The next step is to open a terminal (see Installation section if you need help) and run the code as follows:
# Configuration file job_cfg_file=/home/user/cartocell_training.yaml # Where the experiment output directory should be created result_dir=/home/user/exp_results # Just a name for the job job_name=cartocell_training # Number that should be increased when one need to run the same job multiple times (reproducibility) job_counter=1 # Number of the GPU to run the job in (according to 'nvidia-smi' command) gpu_number=0 # Move where BiaPy installation resides cd BiaPy # Load the environment conda activate BiaPy_env source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh python -u main.py \ --config $job_cfg_file \ --result_dir $result_dir \ --name $job_name \ --run_id $job_counter \ --gpu $gpu_number
Another alternative is to use a Google Colab . Noteworthy, Google Colab standard account do not allow you to run a long number of epochs due to time limitations. Because of this, we set
50 epochs to train and patience to
10 while the original configuration they are set to
100 respectively. In this case you do not need to donwload any data, as the notebook will do it for you.
How to run the inference¶
You can reproduce the exact results of our model M2, Phase 5, of the manuscript via command line using cartocell_inference.yaml configuration file.
You will need to set
TEST.GT_PATH with test_dataset_raw_images and test_dataset_label_images data. You will need to download model_weights_cartocell.h5 file, which is the pretained model, and set its path in
Following the example, the results should be placed in
/home/user/exp_results/cartocell/results. You should find the following directory tree:
cartocell/ ├── config_files/ | ├── cartocell_training.yaml │ └── cartocell_inference.yaml ├── checkpoints │ └── model_weights_cartocell_1.h5 └── results └── cartocell_1 ├── aug │ └── .tif files ├── charts │ ├── cartocell_1_jaccard_index.png │ ├── cartocell_1_loss.png │ └── model_plot_cartocell_1.png ├── per_image │ └── .tif files ├── per_image_instances │ └── .tif files ├── per_image_instances_voronoi │ └── .tif files └── watershed ├── seed_map.tif ├── foreground.tif └── watershed.tif
config_files: directory where the .yaml filed used in the experiment is stored.
cartocell_training.yaml: YAML configuration file used for training.
cartocell_inference.yaml: YAML configuration file used for inference.
checkpoints: directory where model’s weights are stored.
model_weights_cartocell_1.h5: model’s weights file.
results: directory where all the generated checks and results will be stored. There, one folder per each run are going to be placed.
cartocell_1: run 1 experiment folder.
aug: image augmentation samples.
cartocell_1_jaccard_index.png: IoU (jaccard_index) over epochs plot (when training is done).
cartocell_1_loss.png: loss over epochs plot (when training is done).
model_plot_cartocell_1.png: plot of the model.
.tif files: reconstructed images from patches.
.tif files: same as
per_imagebut with the instances.
.tif files: same as
per_image_instancesbut applied Voronoi, which has been the unique post-proccessing applied here.
seed_map.tif: initial seeds created before growing.
foreground.tif: foreground mask area that delimits the grown of the seeds.
watershed.tif: result of watershed.