(Paper) LightMyCells challenge: self-supervised Vision Transformers for image-to-image labeling

This tutorial aims to reproduce the results reported in the following paper:

Franco-Barranco, Daniel, et al. "Self-supervised Vision Transformers for image-to-image
labeling: a BiaPy solution to the LightMyCells Challenge." 2024 IEEE 21th International
Symposium on Biomedical Imaging (ISBI). IEEE, 2024.

In this work, we address the Cell Painting problem within the LightMyCells challenge at the International Symposium on Biomedical Imaging (ISBI) 2024, aiming to predict optimally focused fluorescence images from label-free transmitted light inputs. We used the image to image workflow to solve this problem, where the goal is to learn a mapping between an input image and an output image. We leverage four specialized four UNETR-like models, each dedicated to predicting a specific organelle, and pretrained in a self-supervised manner using MAE.

../../_images/lightmycells_fig1.png

Schematic representation of our organelle-specialized 2D UNETR approach. The base model is a modified UNETR architecture pretrained using MAE. Then, four specialized models are fine-tuned independently for identifying specific organelles using an image-to-image workflow with heavy data augmentation.

We refer the reader to our paper (released soon) to check all details of our approach.

Data preparation

Currently, LightMyCells challenge data can be downloaded in the challenge’s page (registration is needed). In the near future the organizers will publish the data in the BioImage Archive.

You can use the lightmycell_data_preparation.py script available here to organize your data directory structure. This script converts the Study_* folder arrangement from the challenge into the data organization required by BiaPy.

In our proposed approach we implemented a custom data loader to handle more than one out-of-focus image. To ensure the proper operation of the library the data directory tree should be something like this (here actin training data as example):

Expand directory tree
lightmycells_dataset/
├── train
│   ├── x
│      ├── Study_3_BF_image_53_Actin.ome.tiff/
|      |   ├── Study_3_BF_image_53_BF_z0.ome.tiff
|      |   ├── Study_3_BF_image_53_BF_z1.ome.tiff
|      |   ├── . . .
|      |   ├── Study_3_BF_image_53_BF_z19.ome.tiff
│      ├── Study_3_BF_image_54_Actin.ome.tiff/
|      |   ├── Study_3_BF_image_54_BF_z0.ome.tiff
│      ├── . . .
│      ├── Study_6_PC_image_111_Actin.ome.tiff/
|      |   ├── Study_6_PC_image_111_PC_z0.ome.tiff
|      |   ├── Study_6_PC_image_111_PC_z1.ome.tiff
|      |   ├── . . .
|      |   ├── Study_6_PC_image_111_PC_z10.ome.tiff
│   └── y
│       ├── Study_3_BF_image_53_Actin.ome.tiff/
|       |   └── Study_3_BF_image_53_Actin.ome.tiff
│       ├── Study_3_BF_image_54_Actin.ome.tiff/
|       |   └── Study_3_BF_image_54_Actin.ome.tiff
│       ├── . . .
│       ├── Study_6_PC_image_111_Actin.ome.tiff/
|       |   └── Study_6_PC_image_111_Actin.ome.tiff
└── val
    ├── . . .

For the new images you want to predict (test data), you can follow the same directory structure or just put all the images in a directory. You can use the validation folder.

Run

For that you need to download the templates of our four specialized models:

Then you need to modify TRAIN.PATH and TRAIN.GT_PATH with your training data path of EM images and labels respectively. In the same way, do it for the validation data with VAL.PATH and VAL.GT_PATH (we use 10% of the training samples as validation). Regarding the test, by setting TEST.PATH, you can use the same validation path.

Then, you can train by you own those models or you can use directly our checkpoints:

To use our checkpoints you need to first download them (soon avaialable under Bioimage Model Zoo):

You need to update each setting with the location of each checkpoint so BiaPy can find it (use the PATHS.CHECKPOINT_FILE variable). For example, for the nucleus, you need to change PATHS.CHECKPOINT_FILE to the location of your nucleus checkpoint, like this: /home/user/Downloads/lightmycells_nucleus.pth.

You can use our notebook prepared for just doing inference:

  • Inference notebook: lightmycell_colablink

Results

The results are placed in results folder under --result_dir directory with the --name given. All the images are stored in a folder called per_image. You should see there images like the ones depicted below:

../../_images/lightmycells_fig2.png

Results on the LightMyCells challenge of our approach.