(Paper) CartoCell, a high-throughput pipeline for accurate 3D image analysis

This tutorial describes how to train and infer using our custom ResU-Net 3D DNN in order to reproduce the results obtained in (Andrés-San Román, 2022). Given an initial training dataset of 21 segmented epithelial 3D cysts acquired after confocal microscopy, we follow the CartoCell pipeline (figure below) to high-throughput segment hundreds of cysts at low resolution automatically.

../../_images/cartocell_pipeline.png

CartoCell pipeline for high-throughput epithelial cysts segmentation.

../../_images/cyst_sample.gif

Cyst raw image

../../_images/cyst_instance_prediction.gif

Cyst label image

Paper citation:

Andres-San Roman, Jesus A., et al. "CartoCell, a high-content pipeline for 3D image
analysis, unveils cell morphology patterns in epithelia." Cell Reports Methods
3.10 (2023).

CartoCell phases

A small dataset of 21 cysts, stained with cell outlines markers, was acquired at high-resolution in a confocal microscope. Next, the individual cell instances were segmented. The high-resolution images from Phase 1 provides the accurate and realistic set of data necessary for the following steps.

Data preparation

The data is accesible through Zenodo here. The data you need on each phase is:

  • train_M1 and validation to feed the initial model (model M1, Phase 2).

  • train_M2 to run Phase 3 – 5 of CartoCell pipeline.

  • test if you just want to run the inference using our pretrained model M2.

How to train your model

You have two options to train your model: via command line or using Google Colab.

You can reproduce the exact results of our manuscript via command line using cartocell_training.yaml configuration file.

  • In case you want to reproduce our model M1, Phase 2, you will need to modify the DATA.TRAIN.PATH and DATA.TRAIN.GT_PATH with the raw image and their corresponding labels, that is to say, with the paths of train_M1/x and train_M1/y respectively.

  • In case you want to reproduce our model M2, Phase 4, you will need to modify the DATA.TRAIN.PATH and DATA.TRAIN.GT_PATH as above but now using the paths of train_M2/x and train_M2/y.

For the validation data, for both model M1 and model M2, you will need to modify DATA.VAL.PATH and DATA.VAL.GT_PATH with the raw image and their corresponding labels, that is to say, with the paths of validation/x and validation/y respectively.

The next step is to open a terminal and run the code as follows:

# Configuration file
job_cfg_file=/home/user/cartocell_training.yaml
# Where the experiment output directory should be created
result_dir=/home/user/exp_results
# Just a name for the job
job_name=cartocell_training
# Number that should be increased when one need to run the same job multiple times (reproducibility)
job_counter=1
# Number of the GPU to run the job in (according to 'nvidia-smi' command)
gpu_number=0

# Move where BiaPy installation resides
git clone git@github.com:BiaPyX/BiaPy.git
cd BiaPy
git checkout 2bfa7508c36694e0977fdf2c828e3b424011e4b1

# Load the environment
conda activate BiaPy_env

python -u main.py \
    --config $job_cfg_file \
    --result_dir $result_dir  \
    --name $job_name    \
    --run_id $job_counter  \
    --gpu $gpu_number

How to run the inference

You can reproduce the exact results of our model M2, Phase 5, of the manuscript via command line using cartocell_inference.yaml configuration file.

You will need to set DATA.TEST.PATH and DATA.TEST.GT_PATH with test/x and test/y data. You will need to download model_weights_cartocell.h5 file, which is the pretained model, and set its path in PATHS.CHECKPOINT_FILE.

The next step is to open a terminal and run the code as follows:

# Configuration file
job_cfg_file=/home/user/cartocell_inference.yaml
# Where the experiment output directory should be created
result_dir=/home/user/exp_results
# Just a name for the job
job_name=cartocell_inference
# Number that should be increased when one need to run the same job multiple times (reproducibility)
job_counter=1
# Number of the GPU to run the job in (according to 'nvidia-smi' command)
gpu_number=0

# Move where BiaPy installation resides (if you didn't in the previous steps)
git clone git@github.com:BiaPyX/BiaPy.git
cd BiaPy
git checkout 2bfa7508c36694e0977fdf2c828e3b424011e4b1

# Load the environment
conda activate BiaPy_env

python -u main.py \
    --config $job_cfg_file \
    --result_dir $result_dir  \
    --name $job_name    \
    --run_id $job_counter  \
    --gpu $gpu_number

Results

Following the example, the results should be placed in /home/user/exp_results/cartocell/results. You should find the following directory tree:

cartocell/
├── config_files/
|   ├── cartocell_training.yaml
│   └── cartocell_inference.yaml
├── checkpoints
│   └── model_weights_cartocell_1.h5
└── results
    └── cartocell_1
        ├── aug
        │   └── .tif files
        ├── charts
        │   ├── cartocell_1_jaccard_index.png
        │   ├── cartocell_1_loss.png
        │   └── model_plot_cartocell_1.png
        ├── per_image
        │   └── .tif files
        ├── per_image_instances
        │   └── .tif files
        ├── per_image_instances_voronoi
        │   └── .tif files
        └── watershed
            ├── seed_map.tif
            ├── foreground.tif
            └── watershed.tif
  • config_files: directory where the .yaml filed used in the experiment is stored.

    • cartocell_training.yaml: YAML configuration file used for training.

    • cartocell_inference.yaml: YAML configuration file used for inference.

  • checkpoints: directory where model’s weights are stored.

    • model_weights_cartocell_1.h5: model’s weights file.

  • results: directory where all the generated checks and results will be stored. There, one folder per each run are going to be placed.

    • cartocell_1: run 1 experiment folder.

      • aug: image augmentation samples.

      • charts:

        • cartocell_1_jaccard_index.png: IoU (jaccard_index) over epochs plot (when training is done).

        • cartocell_1_loss.png: loss over epochs plot (when training is done).

        • model_plot_cartocell_1.png: plot of the model.

      • per_image:

        • .tif files: reconstructed images from patches.

      • per_image_instances:

        • .tif files: same as per_image but with the instances.

      • per_image_post_processing:

        • .tif files: same as per_image_instances but applied Voronoi, which has been the unique post-proccessing applied here.

      • watershed:

        • seed_map.tif: initial seeds created before growing.

        • foreground.tif: foreground mask area that delimits the grown of the seeds.

        • watershed.tif: result of watershed.