biapy.data.generators

BiaPy data generators package.

This package provides data generator classes and utility functions for loading, augmenting, and batching image and mask data for deep learning workflows in BiaPy. It supports 2D and 3D data, chunked loading, distributed training, and advanced augmentation pipelines.

biapy.data.generators.create_train_val_augmentors(cfg: CfgNode, system_dict: Dict[str, Any], X_train: BiaPyDataset, X_val: BiaPyDataset, norm_module: Dict, Y_train: BiaPyDataset | None = None, Y_val: BiaPyDataset | None = None) → Tuple[DataLoader, DataLoader, int, ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Create training and validation generators.

Parameters:

cfg (Config) – BiaPy configuration.
system_dict (dict) –
System dictionary containing:
- ‘cpu_budget’: int, Total CPU budget.
- ‘cpu_per_rank’: int, CPU budget per rank.
- ‘main_threads’: int, Number of main threads.
- ‘num_workers_hint’: int, Hint for the number of workers.
X_train (BiaPyDataset) – Loaded train X data.
X_val (BiaPyDataset) – Loaded train Y data.
norm_module (Dict) – Normalization module that defines the normalization steps to apply.
Y_train (BiaPyDataset, optional) – Loaded train Y data.
Y_val (BiaPyDataset, optional) – Loaded validation Y data.

Returns:

train_generator (DataLoader) – Training data generator.
val_generator (DataLoader) – Validation data generator.
num_training_steps_per_epoch (int) – Number of training steps per epoch.
bmz_input_sample (4D Numpy array) – Sample of the input data to be used for exporting the model to BMZ. Shape is (1, y, x, channels) for 2D or (1, z, y, x, channels) for 3D.
cover_raw (4D Numpy array) – Sample of the raw cover data to be used for exporting the model to BMZ. Shape is (1, y, x, channels) for 2D or (1, z, y, x, channels) for 3D.
cover_gt (4D Numpy array) – Sample of the GT cover data to be used for exporting the model to BMZ. Shape is (1, y, x, channels) for 2D or (1, z, y, x, channels) for 3D.

biapy.data.generators.create_test_generator(cfg: CfgNode, X_test: Any, Y_test: Any, norm_module: Dict) → Tuple[test_pair_data_generator | test_single_data_generator, ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Create test data generator.

Parameters:

cfg (Config) – BiaPy configuration.
X_test (4D Numpy array) – Test data. E.g. (num_of_images, y, x, channels) for 2D or (num_of_images, z, y, x, channels) for 3D.
Y_test (4D Numpy array) – Test data mask/class. E.g. (num_of_images, y, x, channels) for 2D or (num_of_images, z, y, x, channels) for 3D in all the workflows except classification. For this last the shape is (num_of_images, class) for both 2D and 3D.
norm_module (Dict) – Normalization module that defines the normalization steps to apply.

Returns:

test_generator (test_pair_data_generator/test_single_data_generator) – Test data generator.
bmz_input_sample (4D Numpy array) – Sample of the input data to be used for exporting the model to BMZ. Shape is (1, y, x, channels) for 2D or (1, z, y, x, channels) for 3D.
cover_raw (4D Numpy array) – Sample of the raw cover data to be used for exporting the model to BMZ. Shape is (1, y, x, channels) for 2D or (1, z, y, x, channels) for 3D.
cover_gt (4D Numpy array) – Sample of the GT cover data to be used for exporting the model to BMZ. Shape is (1, y, x, channels) for 2D or (1, z, y, x, channels) for 3D.

biapy.data.generators.by_chunks_collate_fn(data)[source]

Collate function to avoid the default one with type checking. It does nothing speciall but stack the images.

Parameters:: data (tuple) – Data tuple.
Returns:: data – Stacked data in batches.
Return type:: tuple

biapy.data.generators.create_chunked_test_generator(cfg: CfgNode, system_dict: Dict[str, Any], current_sample: Dict, norm_module: Dict, out_dir: str, dtype_str: str) → DataLoader[source]

Create a DataLoader for chunked test data using chunked_test_pair_data_generator.

This function sets up a generator for efficient inference on large volumetric datasets by processing data in chunks. It configures the generator with the appropriate axes, patch size, padding, and normalization, and wraps it in a PyTorch DataLoader with optimal worker settings for distributed or single-GPU environments.

Parameters:

cfg (CN) – BiaPy configuration node.
system_dict (dict) –
System dictionary containing:
- ‘cpu_budget’: int, Total CPU budget.
- ‘cpu_per_rank’: int, CPU budget per rank.
- ‘main_threads’: int, Number of main threads.
- ‘num_workers_hint’: int, Hint for the number of workers.
current_sample (dict) – Dictionary containing the sample to process (e.g., file pointers, data arrays).
norm_module (Dict) – Normalization module to apply to the data.
out_dir (str) – Output directory to save results.
dtype_str (str) – Data type string for output files.

Returns:

test_dataset – PyTorch DataLoader wrapping the chunked test data generator.

Return type:

DataLoader

biapy.data.generators.by_chunks_workflow_collate_fn(data)[source]

Collate function to avoid the default one with type checking. It does nothing speciall but stack the images.

Parameters:: data (tuple) – Data tuple.
Returns:: data – Stacked data in batches.
Return type:: tuple

biapy.data.generators.create_chunked_workflow_process_generator(cfg: CfgNode, system_dict: Dict[str, Any], model_predictions: str, out_dir: str, dtype_str: str) → DataLoader[source]

Create a DataLoader for chunked test data using chunked_workflow_process_generator.

This function sets up a generator for efficient inference on large volumetric datasets by processing data in chunks. It configures the generator with the appropriate axes, patch size, padding, and normalization, and wraps it in a PyTorch DataLoader with optimal worker settings for distributed or single-GPU environments.

Parameters:

cfg (CN) – BiaPy configuration node.
system_dict (dict) –
System dictionary containing:
- ‘cpu_budget’: int, Total CPU budget.
- ‘cpu_per_rank’: int, CPU budget per rank.
- ‘main_threads’: int, Number of main threads.
- ‘num_workers_hint’: int, Hint for the number of workers.
model_predictions (str) – Path to the model predictions to process.
out_dir (str) – Output directory to save results.
dtype_str (str) – Data type string for output files.

Returns:

test_dataset – PyTorch DataLoader wrapping the chunked test data generator.

Return type:

DataLoader

biapy.data.generators.check_generator_consistence(gen: DataLoader, data_out_dir: str, mask_out_dir: str, filenames: List[str] | None = None)[source]

Save all data of a generator in the given path.

Parameters:

gen (Pair2DImageDataGenerator/Single2DImageDataGenerator (2D) or Pair3DImageDataGenerator/Single3DImageDataGenerator (3D)) – Generator to extract the data from.
data_out_dir (str) – Path to store the generator data samples.
mask_out_dir (str) – Path to store the generator data mask samples.
Filenames (List, optional) – Filenames that should be used when saving each image.

Submodules

Module	Description
`biapy.data.generators.augmentors`	Data augmentation utilities for generators.
`biapy.data.generators.pair_base_data_generator`	Base class for generators that handle paired data (2D and 3D).
`biapy.data.generators.pair_data_2D_generator`	Data generator for paired 2D images (e.g., input–target pairs).
`biapy.data.generators.pair_data_3D_generator`	Data generator for paired 3D volumes (e.g., input–target pairs).
`biapy.data.generators.single_base_data_generator`	Base class for generators that handle single-input data (2D and 3D).
`biapy.data.generators.single_data_2D_generator`	Data generator for single-input 2D images.
`biapy.data.generators.single_data_3D_generator`	Data generator for single-input 3D volumes.