biapy.data.data_manipulation

Data Manipulation Module for BiaPy.

This module provides a collection of functions for loading, processing, and manipulating biological image data for deep learning applications. It supports both 2D and 3D data formats, including common file types like TIFF, HDF5, Zarr, and NumPy arrays.

Key Functionalities:

  • Loading training, validation, and test data from various formats

  • Data preprocessing and normalization

  • Image cropping and patching with overlap

  • Data filtering based on various properties

  • Cross-validation and train-test splitting

  • Data augmentation and shape manipulation

  • Format conversion (e.g., to one-hot encoding)

  • Data saving in multiple formats

The module supports:

  • Both 2D and 3D image data

  • Multiple input formats (TIFF, HDF5, Zarr, NumPy arrays)

  • Classification and segmentation workflows

  • Memory-efficient loading of large datasets

  • Parallel processing capabilities

  • Data validation and consistency checks

Main Classes and Functions:

  • load_and_prepare_train_data(): Main function for loading training data

  • load_and_prepare_test_data(): Function for loading test data

  • load_and_prepare_cls_test_data(): For classification test data

  • samples_from_image_list(): Creates dataset from image list

  • samples_from_zarr(): Handles Zarr/HDF5 datasets

  • filter_samples_by_properties(): Filters data based on conditions

  • img_to_onehot_encoding(): Converts masks to one-hot format

  • save_tif(), save_npy_files(): Data saving utilities

Typical Workflow:

  1. Load data using one of the load_and_prepare_* functions

  2. Apply preprocessing/normalization

  3. Filter or augment data as needed

  4. Use in training or save processed data

biapy.data.data_manipulation.load_and_prepare_train_data(train_path: str, train_mask_path: str, train_in_memory: str, train_ov: Tuple[float, ...], train_padding: Tuple[int, ...], val_path: str, val_mask_path: str, val_in_memory: bool, val_ov: Tuple[float, ...], val_padding: Tuple[int, ...], norm_module: Dict, crop_shape: Tuple[int, ...], cross_val: bool = False, cross_val_nsplits: int = 5, cross_val_fold: int = 1, val_split: float = 0.1, seed: int = 0, shuffle_val: bool = True, train_preprocess_f: Callable | None = None, train_preprocess_cfg: CfgNode | None = None, train_filter_props: List[List[str]] = [], train_filter_vals: List[List[float]] = [], train_filter_signs: List[List[str]] = [], val_preprocess_f: Callable | None = None, val_preprocess_cfg: CfgNode | None = None, val_filter_props: List[List[str]] = [], val_filter_vals: List[List[float]] = [], val_filter_signs: List[List[str]] = [], filter_by_entire_image: bool = True, norm_before_filter: bool = False, random_crops_in_DA: bool = False, y_upscaling: Tuple[int, ...] = (1, 1), gt_channels_expected: int = 1, reflect_to_complete_shape: bool = False, convert_to_rgb: bool = False, is_y_mask: bool = False, is_3d: bool = False, train_zarr_data_information: Dict | None = None, val_zarr_data_information: Dict | None = None, multiple_raw_images: bool = False, save_filtered_images: bool = True, save_filtered_images_dir: str | None = None, save_filtered_images_num: int = 3) Tuple[BiaPyDataset, BiaPyDataset, BiaPyDataset, BiaPyDataset][source]

Load training and validation data.

Parameters:
  • train_path (str) – Path to the training data.

  • train_mask_path (str) – Path to the training data masks.

  • train_in_memory (str) – Whether the training data must be loaded in memory or not.

  • train_ov (2D/3D float tuple, optional) – Amount of minimum overlap on x and y dimensions for train data. The values must be on range [0, 1), that is, 0% or 99% of overlap. Shape is (y, x) for 2D or (z, y, x) for 3D.

  • train_padding (2D/3D int tuple, optional) – Size of padding to be added on each axis to the train data. Shape is (y, x) for 2D or (z, y, x) for 3D.

  • val_path (str) – Path to the validation data.

  • val_mask_path (str) – Path to the validation data masks.

  • val_in_memory (str) – Whether the validation data must be loaded in memory or not.

  • val_ov (2D/3D float tuple, optional) – Amount of minimum overlap on x and y dimensions for val data. The values must be on range [0, 1), that is, 0% or 99% of overlap. Shape is (y, x) for 2D or (z, y, x) for 3D.

  • val_padding (2D/3D int tuple, optional) – Size of padding to be added on each axis to the val data. Shape is (y, x) for 2D or (z, y, x) for 3D.

  • norm_module (Dict) – Information about the normalization.

  • crop_shape (3D/4D int tuple, optional) – Shape of the crops. E.g. (y, x, channels) for 2D and (z, y, x, channels) for 3D.

  • cross_val (bool, optional) – Whether to use cross validation or not.

  • cross_val_nsplits (int, optional) – Number of folds for the cross validation.

  • cross_val_fold (int, optional) – Number of the fold to be used as validation.

  • val_split (float, optional) – % of the train data used as validation (value between 0 and 1).

  • seed (int, optional) – Seed value.

  • shuffle_val (bool, optional) – Take random training examples to create validation data.

  • train_preprocess_f (function, optional) – The train preprocessing function, is necessary in case you want to apply any preprocessing.

  • train_preprocess_cfg (dict, optional) – Configuration parameters for train preprocessing, is necessary in case you want to apply any preprocessing.

  • train_filter_props (list of lists of str) – Filter conditions to be applied to the train data. The three variables, filter_props, filter_vals and filter_vals will compose a list of conditions to remove the samples from the list. They are list of list of conditions. For instance, the conditions can be like this: [['A'], ['B','C']]. Then, if the sample satisfies the first list of conditions, only β€˜A’ in this first case (from [β€˜A’] list), or satisfy β€˜B’ and β€˜C’ (from [β€˜B’,’C’] list) it will be removed. In each sublist all the conditions must be satisfied. Available properties are: ['foreground', 'mean', 'min', 'max'].

    Each property descrition:

    • 'foreground' is defined as the mask foreground percentage.

    • 'mean' is defined as the mean value.

    • 'min' is defined as the min value.

    • 'max' is defined as the max value.

    • 'diff' is defined as the difference between ground truth and raw images. Require y_dataset to be provided.

    • 'diff_by_min_max_ratio' is defined as the difference between ground truth and raw images multiplied by the ratio between raw image max and min.

    • 'target_mean' is defined as the mean intensity value of the raw image targets. Require y_dataset to be provided.

    • 'target_min' is defined as the min intensity value of the raw image targets. Require y_dataset to be provided.

    • 'target_max' is defined as the max intensity value of the raw image targets. Require y_dataset to be provided.

    • 'diff_by_target_min_max_ratio' is defined as the difference between ground truth and raw images multiplied by the ratio between ground truth image max and min.

  • train_filter_vals (list of int/float) – Represent the values of the properties listed in train_filter_props that the images need to satisfy to not be dropped.

  • train_filter_signs (list of list of str) – Signs to do the comparison for train data filtering. Options: ['gt', 'ge', 'lt', 'le'] that corresponds to β€œgreather than”, e.g. β€œ>”, β€œgreather equal”, e.g. β€œ>=”, β€œless than”, e.g. β€œ<”, and β€œless equal” e.g. β€œ<=” comparisons.

  • val_preprocess_f (function, optional) – The validation preprocessing function, is necessary in case you want to apply any preprocessing.

  • val_preprocess_cfg (dict, optional) – Configuration parameters for validation preprocessing, is necessary in case you want to apply any preprocessing.

  • val_filter_props (list of lists of str) – Filter conditions to be applied to the validation data. The three variables, filter_props, filter_vals and filter_vals will compose a list of conditions to remove the images from the list. They are list of list of conditions. For instance, the conditions can be like this: [['A'], ['B','C']]. Then, if the sample satisfies the first list of conditions, only β€˜A’ in this first case (from [β€˜A’] list), or satisfy β€˜B’ and β€˜C’ (from [β€˜B’,’C’] list) it will be removed. In each sublist all the conditions must be satisfied. Available properties are: ['foreground', 'mean', 'min', 'max'].

    Each property descrition:

    • 'foreground' is defined as the mask foreground percentage.

    • 'mean' is defined as the mean value.

    • 'min' is defined as the min value.

    • 'max' is defined as the max value.

    • 'diff' is defined as the difference between ground truth and raw images. Require y_dataset to be provided.

    • 'diff_by_min_max_ratio' is defined as the difference between ground truth and raw images multiplied by the ratio between raw image max and min.

    • 'target_mean' is defined as the mean intensity value of the raw image targets. Require y_dataset to be provided.

    • 'target_min' is defined as the min intensity value of the raw image targets. Require y_dataset to be provided.

    • 'target_max' is defined as the max intensity value of the raw image targets. Require y_dataset to be provided.

    • 'diff_by_target_min_max_ratio' is defined as the difference between ground truth and raw images multiplied by the ratio between ground truth image max and min.

  • val_filter_vals (list of int/float) – Represent the values of the properties listed in val_filter_props that the images need to satisfy to not be dropped.

  • val_filter_signs (list of list of str) – Signs to do the comparison for validation data filtering. Options: ['gt', 'ge', 'lt', 'le'] that corresponds to β€œgreather than”, e.g. β€œ>”, β€œgreather equal”, e.g. β€œ>=”, β€œless than”, e.g. β€œ<”, and β€œless equal” e.g. β€œ<=” comparisons.

  • filter_by_entire_image (bool, optional) –

    If filtering is done this will decide how the filtering will be done:

    • True: apply filter image by image.

    • False: apply filtering sample by sample. Each sample represents a patch within an image.

  • norm_before_filter (bool, optional) – Whether to apply normalization before filtering. Be aware then that the values for filtering may change.

  • random_crops_in_DA (bool, optional) – To advice the method that not preparation of the data must be done, as random subvolumes will be created on DA, and the whole volume will be used for that.

  • y_upscaling (2D/3D int tuple, optional) – Upscaling to be done when loading Y data. User for super-resolution workflow.

  • gt_channels_expected (int, optional) – Expected number of channels in the GT.

  • reflect_to_complete_shape (bool, optional) – Wheter to increase the shape of the dimension that have less size than selected patch size padding it with β€˜reflect’.

  • convert_to_rgb (bool, optional) – In case RGB images are expected, e.g. if crop_shape channel is 3, those images that are grayscale are converted into RGB.

  • is_y_mask (bool, optional) – Whether the data are masks. It is used to control the preprocessing of the data.

  • is_3d (bool, optional) – Whether if the expected images to read are 3D or not.

  • train_zarr_data_information (dict, optional) – Additional information when using Zarr/H5 files for training. The following keys are expected:

    • "raw_path", str: path where the raw images reside within the zarr (used when multiple_data_within_zarr is True).

    • "gt_path", str: path where the mask images reside within the zarr (used when multiple_data_within_zarr is True).

    • "use_gt_path", bool: whether the GT that should be used or not.

    • "multiple_data_within_zarr", bool: whether if your input Zarr contains the raw images and labels together or not.

    • "input_img_axes", tuple of int: order of the axes of the images.

    • "input_mask_axes", tuple of int: order of the axes of the masks.

  • val_zarr_data_information (dict, optional) – Additional information when using Zarr/H5 files for validation. Same keys as train_zarr_data_information are expected.

  • multiple_raw_images (bool, optional) – When a folder of folders for each image is expected. In each of those subfolder different versions of the same image are placed. Visit the following tutorial for a real use case and a more detailed description: Light My Cells. This is used when PROBLEM.IMAGE_TO_IMAGE.MULTIPLE_RAW_ONE_TARGET_LOADER is selected.

  • save_filtered_images (bool, optional) – Whether to save or not filtered images.

  • save_filtered_images_dir (str, optional) – Directory to save filtered images.

  • save_filtered_images_num (int, optional) – Number of filtered images to save. Only work when save_filtered_images is True.

Returns:

  • X_train (BiaPyDataset) – Loaded train X dataset.

  • Y_train (BiaPyDataset) – Loaded train Y dataset.

  • X_val (list of dict) – Loaded validation X dataset.

  • Y_val (list of dict) – Loaded validation Y dataset.

biapy.data.data_manipulation.load_and_prepare_test_data(test_path: str, test_mask_path: str | None, multiple_raw_images: bool | None = False, test_zarr_data_information: Dict | None = None) Tuple[BiaPyDataset, BiaPyDataset | None, List][source]

Load test data.

Parameters:
  • test_path (str) – Path to the test data.

  • test_mask_path (str) – Path to the test data masks.

  • multiple_raw_images (bool, optional) – When a folder of folders for each image is expected. In each of those subfolder different versions of the same image are placed. Visit the following tutorial for a real use case and a more detailed description: Light My Cells. This is used when PROBLEM.IMAGE_TO_IMAGE.MULTIPLE_RAW_ONE_TARGET_LOADER is selected.

  • test_zarr_data_information (dict, optional) –

    Additional information when using Zarr/H5 files for test. The following keys are expected:
    • "raw_path", str: path where the raw images reside within the zarr.

    • "gt_path", str: path where the mask images reside within the zarr.

    • "use_gt_path", str: whether the GT that should be used or not.

Returns:

  • X_train (list of dict) –

    Loaded train X data. Each item in the list represents a sample of the dataset. Each sample is represented as follows:
    • "filename", str: name of the image to extract the data sample from.

    • "dir", str: directory where the image resides.

  • Y_train (list of dict, optional) –

    Loaded train Y data. Each item in the list represents a sample of the dataset. Each sample is represented as follows:
    • "train_path", str: name of the image to extract the data sample from.

    • "dir", str: directory where the image resides.

  • test_filenames (list of str) – List of test filenames.

biapy.data.data_manipulation.load_and_prepare_cls_test_data(test_path: str, norm_module: Dict, use_val_as_test: bool, expected_classes: int, crop_shape: Tuple[int, ...], is_3d: bool = True, reflect_to_complete_shape: bool = True, convert_to_rgb: bool = False, use_val_as_test_info: Dict | None = None)[source]

Load test data.

Parameters:
  • train_path (str) – Path to the training data.

  • norm_module (Dict) – Information about the normalization.

  • use_val_as_test (bool) – Whether to use validation data as test.

  • expected_classes (int) – Expected number of classes to be loaded.

  • crop_shape (3D/4D int tuple) – Shape of the crops. E.g. (y, x, channels) for 2D and (z, y, x, channels) for 3D.

  • is_3d (bool, optional) – Whether the data to load is expected to be 3D or not.

  • reflect_to_complete_shape (bool, optional) – Wheter to increase the shape of the dimension that have less size than selected patch size padding it with β€˜reflect’.

  • convert_to_rgb (bool, optional) – In case RGB images are expected, e.g. if crop_shape channel is 3, those images that are grayscale are converted into RGB.

  • use_val_as_test_info (dict, optional) – Additional information to create the test set based on the validation. Used when use_val_as_test is True. The expected keys of the dictionary are as follows:

    • "cross_val_samples_ids", list of int: ids of the validation samples (out of the cross validation).

    • "train_path", str: training path, as the data must be extracted from there.

    • "selected_fold”, int: fold selected in cross validation.

    • "n_splits", int: folds to create in cross validation.

    • "shuffle", bool: whether to shuffle the data or not.

    • "seed", int: mathematical seed.

Returns:

  • X_test (list of dict) –

    Loaded test data. Each item in the list represents a sample of the dataset. Each sample is represented as follows:

    • "filename", str: name of the image to extract the data sample from.

    • "dir", str: directory where the image resides.

    • "class_name", str: name of the class.

    • "class", int: represents the class (-1 if no ground truth provided).

  • test_filenames (list of str) – List of test filenames.

biapy.data.data_manipulation.load_data_from_dir(data_path: str, is_3d: bool = False) List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Create dataset samples from the given list.

Parameters:
  • data_path (str) – Path to read the images from.

  • is_3d (bool, optional) – Whether if the expected images to read are 3D or not.

biapy.data.data_manipulation.load_cls_data_from_dir(data_path: str, norm_module: Dict, expected_classes: int, crop_shape: Tuple[int, ...] | None, is_3d: bool = True, reflect_to_complete_shape: bool = True, convert_to_rgb: bool = False, preprocess_f: Callable | None = None, preprocess_cfg: Dict | None = None) BiaPyDataset[source]

Create dataset samples from the given list following a classification workflow directory tree.

Parameters:
  • data_path (str) – Path to read the images from.

  • norm_module (Dict) – Information about the normalization.

  • expected_classes (int) – Expected number of classes to be loaded.

  • crop_shape (3D/4D int tuple, optional) – Shape of the crops. E.g. (y, x, channels) for 2D and (z, y, x, channels) for 3D.

  • is_3d (bool, optional) – Whether if the expected images to read are 3D or not.

  • reflect_to_complete_shape (bool, optional) – Wheter to increase the shape of the dimension that have less size than selected patch size padding it with β€˜reflect’.

  • convert_to_rgb (bool, optional) – In case RGB images are expected, e.g. if crop_shape channel is 3, those images that are grayscale are converted into RGB.

  • preprocess_f (function, optional) – The preprocessing function, is necessary in case you want to apply any preprocessing.

  • preprocess_cfg (dict, optional) – Configuration parameters for preprocessing, is necessary in case you want to apply any preprocessing.

Returns:

data_samples – Dataset created out of data_path.

Return type:

BiaPyDataset

biapy.data.data_manipulation.load_and_prepare_train_data_cls(train_path: str, train_in_memory: bool, val_path: str, val_in_memory: bool, expected_classes: int, norm_module: Dict, crop_shape: Tuple[int, ...], cross_val: bool = False, cross_val_nsplits: int = 5, cross_val_fold: int = 1, val_split: float = 0.1, seed: int = 0, shuffle_val: bool = True, train_preprocess_f: Callable | None = None, train_preprocess_cfg: Dict | None = None, train_filter_props: List[List[str]] = [], train_filter_vals: List[List[float | int]] = [], train_filter_signs: List[List[str]] = [], val_preprocess_f: Callable | None = None, val_preprocess_cfg: Dict | None = None, val_filter_props: List[List[str]] = [], val_filter_vals: List[List[float | int]] = [], val_filter_signs: List[List[str]] = [], norm_before_filter: bool = False, reflect_to_complete_shape: bool = False, convert_to_rgb: bool = False, is_3d: bool = False)[source]

Load data to train classification methods.

Parameters:
  • train_path (str) – Path to the training data.

  • train_in_memory (str) – Whether the train data must be loaded in memory or not.

  • val_path (str) – Path to the validation data.

  • val_in_memory (str) – Whether the validation data must be loaded in memory or not.

  • expected_classes (int) – Expected number of classes to be loaded.

  • norm_module (Dict) – Information about the normalization.

  • crop_shape (3D/4D int tuple) – Shape of the crops. E.g. (y, x, channels) for 2D and (z, y, x, channels) for 3D.

  • cross_val (bool, optional) – Whether to use cross validation or not.

  • cross_val_nsplits (int, optional) – Number of folds for the cross validation.

  • cross_val_fold (int, optional) – Number of the fold to be used as validation.

  • val_split (float, optional) – % of the train data used as validation (value between 0 and 1).

  • seed (int, optional) – Seed value.

  • shuffle_val (bool, optional) – Take random training examples to create validation data.

  • train_preprocess_f (function, optional) – The train preprocessing function, is necessary in case you want to apply any preprocessing.

  • train_preprocess_cfg (dict, optional) – Configuration parameters for train preprocessing, is necessary in case you want to apply any preprocessing.

  • train_filter_props (list of lists of str) – Filter conditions to be applied to the train data. The three variables, filter_props, filter_vals and filter_vals will compose a list of conditions to remove the samples from the list. They are list of list of conditions. For instance, the conditions can be like this: [['A'], ['B','C']]. Then, if the sample satisfies the first list of conditions, only β€˜A’ in this first case (from [β€˜A’] list), or satisfy β€˜B’ and β€˜C’ (from [β€˜B’,’C’] list) it will be removed. In each sublist all the conditions must be satisfied. Available properties are: ['foreground', 'mean', 'min', 'max'].

    Each property descrition:

    • 'foreground' is defined as the mask foreground percentage.

    • 'mean' is defined as the mean value.

    • 'min' is defined as the min value.

    • 'max' is defined as the max value.

    • 'diff' is defined as the difference between ground truth and raw images. Require y_dataset to be provided.

    • 'diff_by_min_max_ratio' is defined as the difference between ground truth and raw images multiplied by the ratio between raw image max and min.

    • 'target_mean' is defined as the mean intensity value of the raw image targets. Require y_dataset to be provided.

    • 'target_min' is defined as the min intensity value of the raw image targets. Require y_dataset to be provided.

    • 'target_max' is defined as the max intensity value of the raw image targets. Require y_dataset to be provided.

    • 'diff_by_target_min_max_ratio' is defined as the difference between ground truth and raw images multiplied by the ratio between ground truth image max and min.

  • train_filter_vals (list of int/float) – Represent the values of the properties listed in train_filter_props that the images need to satisfy to not be dropped.

  • train_filter_signs (list of list of str) – Signs to do the comparison for train data filtering. Options: ['gt', 'ge', 'lt', 'le'] that corresponds to β€œgreather than”, e.g. β€œ>”, β€œgreather equal”, e.g. β€œ>=”, β€œless than”, e.g. β€œ<”, and β€œless equal” e.g. β€œ<=” comparisons.

  • val_preprocess_f (function, optional) – The validation preprocessing function, is necessary in case you want to apply any preprocessing.

  • val_preprocess_cfg (dict, optional) – Configuration parameters for validation preprocessing, is necessary in case you want to apply any preprocessing.

  • val_filter_props (list of lists of str) – Filter conditions to be applied to the validation data. The three variables, filter_props, filter_vals and filter_vals will compose a list of conditions to remove the images from the list. They are list of list of conditions. For instance, the conditions can be like this: [['A'], ['B','C']]. Then, if the sample satisfies the first list of conditions, only β€˜A’ in this first case (from [β€˜A’] list), or satisfy β€˜B’ and β€˜C’ (from [β€˜B’,’C’] list) it will be removed. In each sublist all the conditions must be satisfied. Available properties are: ['foreground', 'mean', 'min', 'max'].

    Each property descrition:

    • 'foreground' is defined as the mask foreground percentage.

    • 'mean' is defined as the mean value.

    • 'min' is defined as the min value.

    • 'max' is defined as the max value.

    • 'diff' is defined as the difference between ground truth and raw images. Require y_dataset to be provided.

    • 'diff_by_min_max_ratio' is defined as the difference between ground truth and raw images multiplied by the ratio between raw image max and min.

    • 'target_mean' is defined as the mean intensity value of the raw image targets. Require y_dataset to be provided.

    • 'target_min' is defined as the min intensity value of the raw image targets. Require y_dataset to be provided.

    • 'target_max' is defined as the max intensity value of the raw image targets. Require y_dataset to be provided.

    • 'diff_by_target_min_max_ratio' is defined as the difference between ground truth and raw images multiplied by the ratio between ground truth image max and min.

  • val_filter_vals (list of int/float) – Represent the values of the properties listed in val_filter_props that the images need to satisfy to not be dropped.

  • val_filter_signs (list of list of str) – Signs to do the comparison for validation data filtering. Options: ['gt', 'ge', 'lt', 'le'] that corresponds to β€œgreather than”, e.g. β€œ>”, β€œgreather equal”, e.g. β€œ>=”, β€œless than”, e.g. β€œ<”, and β€œless equal” e.g. β€œ<=” comparisons.

  • reflect_to_complete_shape (bool, optional) – Wheter to increase the shape of the dimension that have less size than selected patch size padding it with β€˜reflect’.

  • convert_to_rgb (bool, optional) – In case RGB images are expected, e.g. if crop_shape channel is 3, those images that are grayscale are converted into RGB.

  • is_3d (bool, optional) – Whether if the expected images to read are 3D or not.

Returns:

  • X_train (list of dict) –

    Loaded train data. Each item in the list represents a sample of the dataset. Each sample is represented as follows:

    • "filename", str: name of the image to extract the data sample from.

    • "dir", str: directory where the image resides.

    • "class_name", str: name of the class.

    • "class", int: represents the class (-1 if no ground truth provided).

    • "img", ndarray (optional): image sample itself. It is of (y, x, channels) in 2D and (z, y, x, channels) in 3D. Provided when val_in_memory is True.

  • X_val (list of dict) –

    Loaded validation data. Each item in the list represents a sample of the dataset. Each sample is represented as follows:

    • "filename", str: name of the image to extract the data sample from.

    • "dir", str: directory where the image resides.

    • "class_name", str: name of the class.

    • "class", int: represents the class (-1 if no ground truth provided).

    • "img", ndarray (optional): image sample itself. It is of (y, x, channels) in 2D and (z, y, x, channels) in 3D. Provided when val_in_memory is True.

  • x_val_ids (list of int) – Indexes of the samples beloging to the validation. Used in cross-validation.

biapy.data.data_manipulation.samples_from_image_list(list_of_data: List[str], data_path: str, crop_shape: Tuple[int, ...], ov: Tuple[float, ...], padding: Tuple[int, ...], norm_module: Dict, crop: bool = True, is_mask: bool = False, is_3d: bool = True, reflect_to_complete_shape: bool = True, convert_to_rgb: bool = False, preprocess_f: Callable | None = None, preprocess_cfg: Dict | None = None) BiaPyDataset[source]

Create dataset samples from the given list. This function does not load the data.

Parameters:
  • list_of_data (list of str) – Filenames of the images to read.

  • data_path (str) – Directory of the images to read.

  • crop_shape (3D/4D int tuple) – Shape of the crops. E.g. (y, x, channels) for 2D and (z, y, x, channels) for 3D.

  • ov (2D/3D float tuple) – Amount of minimum overlap on x and y dimensions. The values must be on range [0, 1), that is, 0% or 99% of overlap. Shape is (y, x) for 2D or (z, y, x) for 3D.

  • padding (2D/3D int tuple) – Size of padding to be added on each axis. Shape is (y, x) for 2D or (z, y, x) for 3D.

  • norm_module (Dict) – Information about the normalization.

  • crop (bool, optional) – Whether if the data needs to be cropped or not.

  • is_mask (bool, optional) – Whether the data are masks. It is used to control the preprocessing of the data.

  • is_3d (bool, optional) – Whether the data to load is expected to be 3D or not.

  • reflect_to_complete_shape (bool, optional) – Wheter to increase the shape of the dimension that have less size than selected patch size padding it with β€˜reflect’.

  • convert_to_rgb (bool, optional) – In case RGB images are expected, e.g. if crop_shape channel is 3, those images that are grayscale are converted into RGB.

  • preprocess_f (function, optional) – The preprocessing function, is necessary in case you want to apply any preprocessing.

  • preprocess_cfg (dict, optional) – Configuration parameters for preprocessing, is necessary in case you want to apply any preprocessing.

Returns:

dataset – Dataset.

Return type:

BiaPyDataset

biapy.data.data_manipulation.samples_from_zarr(list_of_data: List[str], data_path: str, zarr_data_info: Dict, crop_shape: Tuple[int, ...], ov: Tuple[float, ...], padding: Tuple[int, ...], is_mask: bool = False, is_3d: bool = True) BiaPyDataset[source]

Create dataset samples from the given list. This function does not load the data.

Parameters:
  • list_of_data (list of str) – Filenames of the images to read.

  • data_path (str) – Directory of the images to read.

  • zarr_data_info (dict) –

    Additional information when using Zarr/H5 files for training. The following keys are expected:
    • "raw_path": path where the raw images reside within the zarr (used when multiple_data_within_zarr is True).

    • "gt_path": path where the mask images reside within the zarr (used when multiple_data_within_zarr is True).

    • "multiple_data_within_zarr": Whether if your input Zarr contains the raw images and labels together or not.

    • "input_img_axes": order of the axes of the images.

    • "input_mask_axes": order of the axes of the masks.

  • crop_shape (3D/4D int tuple) – Shape of the crops. E.g. (y, x, channels) for 2D and (z, y, x, channels) for 3D.

  • ov (2D/3D float tuple, optional) – Amount of minimum overlap on x and y dimensions. The values must be on range [0, 1), that is, 0% or 99% of overlap. Shape is (y, x) for 2D or (z, y, x) for 3D.

  • padding (2D/3D int tuple, optional) – Size of padding to be added on each axis. Shape is (y, x) for 2D or (z, y, x) for 3D.

  • is_mask (bool, optional) – Whether the data are masks. It is used to control the preprocessing of the data.

  • is_3d (bool, optional) – Whether the data to load is expected to be 3D or not.

Returns:

dataset – Dataset.

Return type:

BiaPyDataset

biapy.data.data_manipulation.samples_from_image_list_multiple_raw_one_gt(data_path: str, gt_path: str, crop_shape: Tuple[int, ...], ov: Tuple[float, ...], padding: Tuple[int, ...], norm_module: Dict, crop: bool = True, is_3d: bool = True, reflect_to_complete_shape: bool = True, convert_to_rgb: bool = False, preprocess_f: Callable | None = None, preprocess_cfg: Dict | None = None) Tuple[BiaPyDataset, BiaPyDataset][source]

Create dataset samples from the given lists. This function does not load the data.

Parameters:
  • data_path (str) – Directory of the images to read.

  • gt_path (str) – Directory to read ground truth images from.

  • crop_shape (3D/4D int tuple) – Shape of the crops. E.g. (y, x, channels) for 2D and (z, y, x, channels) for 3D.

  • ov (2D/3D float tuple) – Amount of minimum overlap on x and y dimensions. The values must be on range [0, 1), that is, 0% or 99% of overlap. Shape is (y, x) for 2D or (z, y, x) for 3D.

  • padding (2D/3D int tuple) – Size of padding to be added on each axis. Shape is (y, x) for 2D or (z, y, x) for 3D.

  • norm_module (Dict) – Information about the normalization.

  • crop (bool, optional) – Whether if the data needs to be cropped or not.

  • is_3d (bool, optional) – Whether the data to load is expected to be 3D or not.

  • reflect_to_complete_shape (bool, optional) – Wheter to increase the shape of the dimension that have less size than selected patch size padding it with β€˜reflect’.

  • convert_to_rgb (bool, optional) – In case RGB images are expected, e.g. if crop_shape channel is 3, those images that are grayscale are converted into RGB.

  • preprocess_f (function, optional) – The preprocessing function, is necessary in case you want to apply any preprocessing.

  • preprocess_cfg (dict, optional) – Configuration parameters for preprocessing, is necessary in case you want to apply any preprocessing.

Returns:

  • dataset (BiaPyDataset) – X dataset.

  • gt_dataset (BiaPyDataset) – Y dataset.

biapy.data.data_manipulation.samples_from_class_list(data_path: str, norm_module: Dict, crop_shape: Tuple[int, ...] | None = None, expected_classes: int = -1, is_3d: bool = True, reflect_to_complete_shape: bool = True, convert_to_rgb: bool = False) BiaPyDataset[source]

Create dataset samples from the given path taking into account that each subfolder represents a class. This function does not load the data.

Parameters:
  • data_path (str) – Directory of the images to read.

  • norm_module (Dict) – Information about the normalization.

  • crop_shape (3D/4D int tuple, optional) – Shape of the crops. E.g. (y, x, channels) for 2D and (z, y, x, channels) for 3D.

  • expected_classes (int, optional) – Expected number of classes to be loaded. Set to -1 if you don’t expect any.

  • is_3d (bool, optional) – Whether the data to load is expected to be 3D or not.

  • reflect_to_complete_shape (bool, optional) – Wheter to increase the shape of the dimension that have less size than selected patch size padding it with β€˜reflect’.

  • convert_to_rgb (bool, optional) – In case RGB images are expected, e.g. if crop_shape channel is 3, those images that are grayscale are converted into RGB.

Returns:

sample_list – Samples generated out of data_path.

Return type:

list of DataSample

biapy.data.data_manipulation.filter_samples_by_properties(x_dataset: BiaPyDataset, is_3d: bool, filter_props: List[List[str]], filter_vals: List[List[float | int]], filter_signs: List[List[str]], crop_shape: Tuple[int, ...], reflect_to_complete_shape: bool = False, filter_by_entire_image: bool = True, norm_before_filter: bool = False, norm_module: Dict | None = None, y_dataset: BiaPyDataset | None = None, zarr_data_information: Dict | None = None, save_filtered_images: bool = True, save_filtered_images_dir: str | None = None, save_filtered_images_num: int = 3)[source]

Filter samples from x_dataset using defined conditions.

The filtering will be done using the images each sample is extracted from. However, if zarr_data_info is provided the function will assume that Zarr/h5 files are provided, so the filtering will be performed sample by sample.

Parameters:
  • x_dataset (BiaPyDataset) – X dataset to filter samples from.

  • is_3d (bool, optional) – Whether the data to load is expected to be 3D or not.

  • filter_props (list of lists of str) – Filter conditions to be applied. The three variables, filter_props, filter_vals and filter_vals will compose a list of conditions to remove the images from the list. They are list of list of conditions. For instance, the conditions can be like this: [['A'], ['B','C']]. Then, if the sample satisfies the first list of conditions, only β€˜A’ in this first case (from [β€˜A’] list), or satisfy β€˜B’ and β€˜C’ (from [β€˜B’,’C’] list) it will be removed. In each sublist all the conditions must be satisfied. Available properties are: ['foreground', 'mean', 'min', 'max', diff, target_mean, target_min, target_max]. Each property descrition:

    • 'foreground' is defined as the mask foreground percentage.

    • 'mean' is defined as the mean value.

    • 'min' is defined as the min value.

    • 'max' is defined as the max value.

    • 'diff' is defined as the difference between ground truth and raw images. Require y_dataset to be provided.

    • 'diff_by_min_max_ratio' is defined as the difference between ground truth and raw images multiplied by the ratio between raw image max and min.

    • 'target_mean' is defined as the mean intensity value of the raw image targets. Require y_dataset to be provided.

    • 'target_min' is defined as the min intensity value of the raw image targets. Require y_dataset to be provided.

    • 'target_max' is defined as the max intensity value of the raw image targets. Require y_dataset to be provided.

    • 'diff_by_target_min_max_ratio' is defined as the difference between ground truth and raw images multiplied by the ratio between ground truth image max and min.

  • filter_vals (list of int/float) – Represent the values of the properties listed in filter_props that the images need to satisfy to not be dropped.

  • filter_signs (list of list of str) – Signs to do the comparison. Options: ['gt', 'ge', 'lt', 'le'] that corresponds to β€œgreather than”, e.g. β€œ>”, β€œgreather equal”, e.g. β€œ>=”, β€œless than”, e.g. β€œ<”, and β€œless equal” e.g. β€œ<=” comparisons.

  • crop_shape (3D/4D int tuple) – Shape of the crops. E.g. (y, x, channels) for 2D and (z, y, x, channels) for 3D.

  • reflect_to_complete_shape (bool, optional) – Wheter to increase the shape of the dimension that have less size than selected patch size padding it with β€˜reflect’.

  • filter_by_entire_image (bool, optional) –

    This decides how the filtering is done:

    • True: apply filter image by image.

    • False: apply filtering sample by sample. Each sample represents a patch within an image.

  • norm_before_filter (bool, optional) – Whether to apply normalization before filtering. Be aware then that the values for filtering may change.

  • norm_module (Dict) – Information about the normalization.

  • y_dataset (BiaPyDataset, optional) – Y dataset to filter samples from.

  • zarr_data_info (dict, optional) –

    Additional information when using Zarr/H5 files for training. The following keys are expected:

    • "raw_path": path where the raw images reside within the zarr (used when multiple_data_within_zarr is True).

    • "gt_path": path where the mask images reside within the zarr (used when multiple_data_within_zarr is True).

    • "multiple_data_within_zarr": Whether if your input Zarr contains the raw images and labels together or not.

    • "input_img_axes": order of the axes of the images.

    • "input_mask_axes": order of the axes of the masks.

  • save_filtered_images (bool, optional) – Whether to save or not filtered images.

  • save_filtered_images_dir (str, optional) – Directory to save filtered images.

  • save_filtered_images_num (int, optional) – Number of filtered images to save. Only work when save_filtered_images is True.

Returns:

  • new_x_filenames (list of dict) – x_dataset list filtered.

  • new_y_filenames (list of dict, optional) – y_dataset list filtered.

biapy.data.data_manipulation.sample_satisfy_conds(img: ndarray[tuple[int, ...], dtype[_ScalarType_co]], filter_props: List[List[str]], filter_vals: List[List[float | int]], filter_signs: List[List[str]], mask: ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, img_ratio: float = 0, mask_ratio: float | None = 0) bool[source]

Whether img satisfy at least one of the conditions composed by filter_props, filter_vals, filter_sings.

Parameters:
  • img (4D/5D Numpy array) – Image to check if satisfy conditions. E.g. (z, y, x, num_classes) for 3D or (y, x, num_classes) for 2D.

  • filter_props (list of lists of str) – Filter conditions to be applied. The three variables, filter_props, filter_vals and filter_vals will compose a list of conditions to remove the images from the list. They are list of list of conditions. For instance, the conditions can be like this: [['A'], ['B','C']]. Then, if the sample satisfies the first list of conditions, only β€˜A’ in this first case (from [β€˜A’] list), or satisfy β€˜B’ and β€˜C’ (from [β€˜B’,’C’] list) it will be removed. In each sublist all the conditions must be satisfied. Available properties are: ['foreground', 'mean', 'min', 'max']. Each property descrition:

    • 'foreground' is defined as the mask foreground percentage.

    • 'mean' is defined as the mean value of the input.

    • 'min' is defined as the min value of the input.

    • 'max' is defined as the max value of the input.

    • 'diff' is defined as the difference between ground truth and raw images. Require y_dataset to be provided.

    • 'diff_by_min_max_ratio' is defined as the difference between ground truth and raw images multiplied by the ratio between raw image max and min.

    • 'target_mean' is defined as the mean intensity value of the raw image targets. Require y_dataset to be provided.

    • 'target_min' is defined as the min intensity value of the raw image targets. Require y_dataset to be provided.

    • 'target_max' is defined as the max intensity value of the raw image targets. Require y_dataset to be provided.

    • 'diff_by_target_min_max_ratio' is defined as the difference between ground truth and raw images multiplied by the ratio between ground truth image max and min.

  • filter_vals (list of int/float) – Represent the values of the properties listed in filter_props that the images need to satisfy to not be dropped.

  • filter_signs (list of list of str) – Signs to do the comparison. Options: ['gt', 'ge', 'lt', 'le'] that corresponds to β€œgreather than”, e.g. β€œ>”, β€œgreather equal”, e.g. β€œ>=”, β€œless than”, e.g. β€œ<”, and β€œless equal” e.g. β€œ<=” comparisons.

  • mask (4D/5D Numpy array, optional) – Mask to check if satisfy β€œforeground” condition in filter_props. E.g. (z, y, x, num_classes) for 3D or (y, x, num_classes) for 2D.

  • img_ratio (float, optional) – Ratio of the input image. Expected to be (img.max - img.min) of the entire image.

  • mask_ratio (float, optional) – Minimum value of the entire image. Expected to be (mask.max - mask.min) of the entire image.

Returns:

satisfy_conds – Whether if the sample satisfy one of the conditions or not.

Return type:

bool

biapy.data.data_manipulation.load_images_to_dataset(dataset: BiaPyDataset, crop_shape: Tuple[int, ...] | None, reflect_to_complete_shape: bool = False, convert_to_rgb: bool = False, is_mask: bool = False, is_3d: bool = False, preprocess_cfg: Dict | None = None, preprocess_f: Callable | None = None, zarr_data_information: Dict | None = None)[source]

Load images into the dataset: creating "img" key.

The process done faster if the samples extracted from the same image are in continuous positions within the list.

Parameters:
  • dataset (BiaPyDataset) – Loaded data.

  • crop_shape (3D/4D int tuple) – Shape of the expected crops. E.g. (y, x, channels) for 2D and (z, y, x, channels) for 3D.

  • reflect_to_complete_shape (bool, optional) – Whether to increase the shape of the dimension that have less size than selected patch size padding it with β€˜reflect’.

  • convert_to_rgb (bool, optional) – In case RGB images are expected, e.g. if crop_shape channel is 3, those images that are grayscale are converted into RGB.

  • preprocess_cfg (dict, optional) – Configuration parameters for preprocessing, is necessary in case you want to apply any preprocessing.

  • is_mask (bool, optional) – Whether the data are masks. It is used to control the preprocessing of the data.

  • preprocess_f (function, optional) – The preprocessing function, is necessary in case you want to apply any preprocessing.

  • is_3d (bool, optional) – Whether the data to load is expected to be 3D or not.

  • zarr_data_information (dict, optional) – Additional information of where to find the data within the Zarr files.

biapy.data.data_manipulation.pad_and_reflect(img: ndarray[tuple[int, ...], dtype[_ScalarType_co]], crop_shape: Tuple[int, ...], verbose: bool = False) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Load data from a directory.

Parameters:
  • img (3D/4D Numpy array) – Image to pad. E.g. (y, x, channels) or (z, y, x, channels).

  • crop_shape (Tuple of 3/4 int, optional) – Shape of the subvolumes to create when cropping. E.g. (y, x, channels) or (z, y, x, channels).

  • verbose (bool, optional) – Whether to output information.

Returns:

img – Image padded. E.g. (y, x, channels) for 2D and (z, y, x, channels) for 3D.

Return type:

3D/4D Numpy array

biapy.data.data_manipulation.extract_patch_within_image(img: ndarray[tuple[int, ...], dtype[_ScalarType_co]], coords: PatchCoords, is_3d=False) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Extract patch within the image.

Parameters:
  • img (3D/4D Numpy array) – Input image to extract the patch from. E.g. (y, x, channels) in 2D and (z, y, x, channels) in 3D.

  • coords (dict) –

    Coordinates of the crop where the following keys are expected:
    • "z_start": starting point of the patch in Z axis.

    • "z_end": end point of the patch in Z axis.

    • "y_start": starting point of the patch in Y axis.

    • "y_end": end point of the patch in Y axis.

    • "x_start": starting point of the patch in X axis.

    • "x_end": end point of the patch in X axis.

  • is_3d (bool, optional) – Whether if the expected image to read is 3D or not.

Returns:

img – X element. E.g. (y, x, channels) in 2D and (z, y, x, channels) in 3D.

Return type:

3D/4D Numpy array

biapy.data.data_manipulation.img_to_onehot_encoding(img: ndarray[tuple[int, ...], dtype[_ScalarType_co]], num_classes: int = 2) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Convert image given into one-hot encode format.

The opposite function is onehot_encoding_to_img().

Parameters:
  • img (Numpy 3D/4D array) – Image. E.g. (y, x, channels) or (z, y, x, channels).

  • num_classes (int, optional) – Number of classes to distinguish.

Returns:

one_hot_labels – Data one-hot encoded. E.g. (y, x, num_classes) or (z, y, x, num_classes).

Return type:

Numpy 3D/4D array

biapy.data.data_manipulation.onehot_encoding_to_img(encoded_image: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Convert one-hot encode image into an image with jus tone channel and all the classes represented by an integer.

The opposite function is img_to_onehot_encoding().

Parameters:

encoded_image (Numpy 3D/4D array) – Image. E.g. (y, x, channels) or (z, y, x, channels).

Returns:

img – Data one-hot encoded. E.g. (z, y, x, num_classes).

Return type:

Numpy 3D/4D array

biapy.data.data_manipulation.load_img_data(path: str, is_3d: bool = False, data_within_zarr_path: str | None = None) Tuple[ndarray[tuple[int, ...], dtype[Any]], str][source]

Load data from a given path.

Parameters:
  • path (str) – Path to the image to read.

  • is_3d (bool, optional) – Whether if the expected image to read is 3D or not.

  • data_within_zarr_path (str, optional) – Path to find the data within the Zarr file. E.g. β€˜volumes.labels.neuron_ids’.

Returns:

  • data (Zarr, H5 or Numpy 3D/4D array) – Data read. E.g. (z, y, x, channels) for 3D or (y, x, channels) for 2D.

  • file (str) – File of the data read. Useful to close it in case it is an H5 file.

biapy.data.data_manipulation.read_img_as_ndarray(path: str, is_3d: bool = False) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Read an image from a given path.

Parameters:
  • path (str) – Path to the image to read.

  • is_3d (bool, optional) – Whether if the expected image to read is 3D or not.

Returns:

img – Image read. E.g. (z, y, x, channels) for 3D or (y, x, channels) for 2D.

Return type:

Numpy 3D/4D array

biapy.data.data_manipulation.imread(path: str) ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], str | None][source]

Read an image from a given path.

In the past from skimage.io import imread was used but now it is deprecated.

Parameters:

path (str) – Path to the image to read.

Returns:

img – Image read.

Return type:

Numpy array

biapy.data.data_manipulation.imwrite(path: str, image: ndarray[tuple[int, ...], dtype[_ScalarType_co]])[source]

Write data in the given path.

In the past from skimage.io import imsave was used but now it is deprecated.

Parameters:
  • path (str) – Path to the image to read.

  • image (Numpy array) – Image to store.

biapy.data.data_manipulation.check_value(value: int | float | Tuple[int | float] | List[float | int] | ndarray[tuple[int, ...], dtype[_ScalarType_co]], value_range: Tuple[int | float, int | float] = (0, 1)) bool[source]

Check whether a value or a collection of values falls within a specified range.

This function supports individual values (int, float), lists or tuples of values, and NumPy arrays. If value is a list or tuple, all elements must fall within the specified value_range. For NumPy arrays, both the minimum and maximum values of the array must be within the range.

Parameters:
  • value (int, float, list, tuple or np.ndarray) – The value or collection of values to check.

  • value_range (tuple of (int or float), optional) – A (min, max) tuple specifying the inclusive range of valid values. Default is (0, 1).

Returns:

True if all values are within the specified range; False otherwise.

Return type:

bool

biapy.data.data_manipulation.data_range(x: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) str[source]

Determine the value range of a NumPy array commonly used in image data.

This function checks whether the input array falls within one of the standard intensity ranges used in image processing: [0, 1], [0, 255], or [0, 65535], corresponding to normalized float, 8-bit, or 16-bit unsigned integer images, respectively.

Parameters:

x (np.ndarray) – The input array whose range is to be determined.

Returns:

A string indicating the value range: - β€œ01 range” for values in [0, 1] - β€œuint8 range” for values in [0, 255] - β€œuint16 range” for values in [0, 65535] - β€œnone_range” if values fall outside these common ranges

Return type:

str

Raises:

ValueError – If the input is not a NumPy array.

biapy.data.data_manipulation.check_masks(path: str, n_classes: int = 2, is_3d: bool = False)[source]

Check whether the data masks have the correct labels inspection a few random images of the given path.

If the function gives no error one should assume that the masks are correct.

Parameters:
  • path (str) – Path to the data mask.

  • n_classes (int, optional) – Maximum classes that the masks must contain.

  • is_3d (bool, optional) – Whether if the expected image to read is 3D or not.

biapy.data.data_manipulation.shape_mismatch_message(X_data: BiaPyDataset, Y_data: BiaPyDataset) str[source]

Build an error message with the shape mismatch between two provided data X_data and Y_data.

Parameters:
  • X_data (BiaPyDataset) – X data.

  • Y_data (BiaPyDataset) – Y data.

Returns:

mistmatch_message – Message containing which samples mismatch.

Return type:

str

biapy.data.data_manipulation.save_tif(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]], data_dir: str, filenames: List[str] | None = None, verbose: bool = True)[source]

Save images in the given directory.

If the input file has a different dtype than np.uint8, np.uint16, np.float32 it is casted into np.float32 automatically. This is done because if not the axes are not correctly set when opening resulting images in Fiji/ImageJ.

Parameters:
  • X (4D/5D numpy array) – Data to save as images. The first dimension must be the number of images. E.g. (num_of_images, y, x, channels) or (num_of_images, z, y, x, channels).

  • data_dir (str) – Path to store X images.

  • filenames (List, optional) – Filenames that should be used when saving each image.

  • verbose (bool, optional) – To print saving information.

biapy.data.data_manipulation.save_tif_pair_discard(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Y: ndarray[tuple[int, ...], dtype[_ScalarType_co]], data_dir: str, suffix: str = '', filenames: List | None = None, discard: bool = True, verbose: bool = True)[source]

Save images in the given directory.

Parameters:
  • X (4D/5D numpy array) – Data to save as images. The first dimension must be the number of images. E.g. (num_of_images, y, x, channels) or (num_of_images, z, y, x, channels).

  • Y (4D/5D numpy array) – Data mask to save. The first dimension must be the number of images. E.g. (num_of_images, y, x, channels) or (num_of_images, z, y, x, channels).

  • data_dir (str) – Path to store X images.

  • suffix (str, optional) – Suffix to apply on output directory.

  • filenames (List, optional) – Filenames that should be used when saving each image.

  • discard (bool, optional) – Whether to discard image/mask pairs if the mask has no label information.

  • verbose (bool, optional) – To print saving information.

biapy.data.data_manipulation.save_npy_files(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]], data_dir: str, filenames: List[str] | None = None, verbose: bool = True)[source]

Save images in the given directory.

Parameters:
  • X (4D/5D numpy array) – Data to save as images. The first dimension must be the number of images. E.g. (num_of_images, y, x, channels) or (num_of_images, z, y, x, channels).

  • data_dir (str) – Path to store X images.

  • filenames (List, optional) – Filenames that should be used when saving each image.

  • verbose (bool, optional) – To print saving information.

biapy.data.data_manipulation.reduce_dtype(x: ndarray[tuple[int, ...], dtype[_ScalarType_co]], x_min: float, x_max: float, out_min: float = 0, out_max: float = 1, out_type: str = 'float32', eps: float = 1e-06) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Reduce the data type of the given input to the selected range.

It uses the following formula: results = ((x - x_min)/(x_max - x_min)) * (out_max - out_min)

Parameters:
  • x (3D/4D Numpy array) – Image to reduce it’s data type. E.g. (y, x, channels) in 2D and (z, y, x, channels) in 3D.

  • x_min (float) – x_min in the formula above.

  • x_max (float) – x_max in the formula above.

  • out_min (float, optional) – out_min in the formula above.

  • out_max (float, optional) – out_max in the formula above.

  • out_type (str, optional) – Type of the output data.

  • eps (float, optional) – Epsilon to use in order to avoid zero division.

Returns:

x – Data type reduced image. E.g. (y, x, channels) in 2D and (z, y, x, channels) in 3D.

Return type:

3D/4D Numpy array

biapy.data.data_manipulation.resize(input_data, size, mode='bilinear', **kwargs)[source]

Resize a multi-dimensional image tensor or array to a specified size.

This function resizes 2D or 3D image data in either PyTorch tensor or NumPy array format using appropriate interpolation methods. The input is expected to follow common conventions for image dimensions.

Supported input formats: - PyTorch tensor of shape (B, C, H, W) for 2D or (B, C, D, H, W) for 3D data - NumPy array of shape (B, H, W, C) for 2D or (B, D, H, W, C) for 3D data

Parameters:
  • input_data (torch.Tensor or np.ndarray) – The image data to be resized.

  • size (tuple of int) – Target size for each dimension. Must match the number of dimensions in input_data. Only spatial dimensions are resized (e.g., H, W, D), batch and channel dimensions are preserved.

  • mode (str, optional) – Interpolation mode to use. Must be one of the keys in interp_mode_map. Defaults to β€˜bilinear’.

  • **kwargs (dict) – Additional arguments passed to torch.nn.functional.interpolate or skimage.transform.resize.

Returns:

The resized image data in the same format as the input.

Return type:

torch.Tensor or np.ndarray

Raises:
  • ValueError – If the length of size does not match the number of dimensions in input_data, or if an unsupported interpolation mode is specified.

  • TypeError – If input_data is neither a PyTorch tensor nor a NumPy array.

biapy.data.data_manipulation.decide_dtype(num_values: int) dtype[source]

Decide the smallest unsigned integer dtype that can hold the given number of values.

Parameters:

num_values (int) – The number of distinct values that need to be represented.

Returns:

The smallest unsigned integer dtype that can represent num_values distinct values. Possible return values are np.uint8, np.uint16, or np.uint32.

Return type:

np.dtype

Raises:

ValueError – If num_values is negative or exceeds the maximum representable by np.uint32.