biapy.data.datasetο
Dataset utilities for organizing input data in BiaPy.
This module provides the foundational data structures for managing and organizing datasets within BiaPy. It includes representations for both individual data files and data samples, as well as the overall dataset structure used during training and inference.
Classesο
- DatasetFile:
Represents metadata and statistics associated with an individual input file. This includes the file path, size, shape, and any derived properties needed for data handling.
- DataSample:
Encapsulates a single sample of the dataset, typically representing one training or inference instance. It stores indexing information (e.g., crop position, file ID) and can also include per-sample weights, masks, or labels.
- BiaPyDataset:
Main class that manages a full dataset, composed of a list of DatasetFile and a list of DataSample. Provides methods to clean or filter the dataset, and supports deep copying for safe reuse.
- PatchCoords:
Encapsulates the coordinates of a patch within an image.
Typical usageο
from biapy.data.dataset import DatasetFile, DataSample, BiaPyDataset
# Assume dataset_info and sample_list are preconstructed lists of DatasetFile and DataSample
dataset = BiaPyDataset(dataset_info=dataset_info, sample_list=sample_list)
# Clean dataset by keeping only a subset of samples or images
dataset.clean_dataset(samples_to_maintain=[0, 2, 5], clean_by="sample")
- class biapy.data.dataset.BiaPyDataset(dataset_info: List[DatasetFile], sample_list: List[DataSample])[source]ο
Bases:
objectA lightweight container for dataset information used in BiaPy workflows.
This class stores and manages image-level and sample-level metadata for training, validation, or testing datasets. It encapsulates:
dataset_info: A list ofDatasetFileinstances, each representing a full image file along with relevant metadata.sample_list: A list ofDataSampleinstances, each representing a patch or sample extracted from one of the images indataset_info.
- clean_dataset(samples_to_maintain: List[int] | ndarray[tuple[int, ...], dtype[_ScalarType_co]], clean_by: str = 'image')[source]ο
Remove unwanted samples or images from the dataset.
This method filters the dataset to retain only a subset of samples or images. It also updates internal IDs to remain consistent after filtering.
- Parameters:
samples_to_maintain (list of int or ndarray) β Indices of samples or images to retain, depending on clean_by.
clean_by (str, default=βimageβ) β Strategy for filtering the dataset. Must be one of: - βsampleβ: samples_to_maintain refers to sample indices. - βimageβ: samples_to_maintain refers to image indices.
- Raises:
AssertionError β If clean_by is not one of [βsampleβ, βimageβ].
- class biapy.data.dataset.DatasetFile(path: str, shape: Tuple | None = None, parallel_data: bool | None = None, input_axes: str | None = None, norm_info: Dict = None, class_num: int | None = None, class_name: str | None = None)[source]ο
Bases:
objectA data structure to store metadata and normalization statistics for a single input file.
This class encapsulates the file path, shape, and optional information required for preprocessing and normalization of bioimage data. It is used internally by BiaPy to organize and access input data consistently across different workflows.
- is_parallel() bool[source]ο
Return whether the dataset file uses a parallel format (e.g., Zarr or H5).
- Returns:
True if the file is marked as parallel, False otherwise.
- Return type:
bool
- get_input_axes() str | None[source]ο
Return the axes format string of the dataset, if defined.
- Returns:
The input axes string (e.g., βZYXCβ), or None if not set.
- Return type:
str or None
- get_class_num() int[source]ο
Return the class index associated with the dataset file.
- Returns:
Class number if defined, otherwise -1.
- Return type:
int
- class biapy.data.dataset.DataSample(fid: int, coords: PatchCoords | None, img: ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, gt_associated_id: int | None = None, input_axes: str | None = None, path_in_zarr: str | None = None)[source]ο
Bases:
objectRepresents a single data sample extracted from a larger dataset file.
A DataSample contains metadata and optionally the image data of a subvolume or patch extracted from a parent image. It is primarily used to organize and manipulate training, validation, or test samples during deep learning workflows in BiaPy.
- img_is_loaded()[source]ο
Check whether the image data has been loaded into memory.
- Returns:
True if image data is present in the sample, False otherwise.
- Return type:
bool
- get_shape() Tuple[int, int] | Tuple[int, int, int] | None[source]ο
Get the spatial shape of the sample based on its coordinates.
- Returns:
Returns a tuple representing the shape of the patch (2D or 3D), or None if coordinates are not defined.
- Return type:
tuple of int or None
- get_path_in_zarr() str | None[source]ο
Get the internal path in the Zarr/H5 file, if available.
- Returns:
Path to the dataset within the file, or None if not set.
- Return type:
str or None
- get_gt_associated_id() int | None[source]ο
Get the index of the ground truth sample associated with this input.
- Returns:
Index of the ground truth, or None if not set.
- Return type:
int or None
- class biapy.data.dataset.PatchCoords(y_start: int, y_end: int, x_start: int, x_end: int, z_start: int | None = None, z_end: int | None = None)[source]ο
Bases:
objectCoordinates of a 2D or 3D patch within an image volume.
This class stores the spatial boundaries of a patch, allowing BiaPy to extract or reference subvolumes from larger datasets. It supports both 2D (Y, X) and 3D (Z, Y, X) data.
- extract_shape_from_coords() Tuple[int, int] | Tuple[int, int, int][source]ο
Compute the spatial shape of the patch based on its coordinates.
- Returns:
A tuple representing the shape of the patch in the order (Z, Y, X) for 3D or (Y, X) for 2D, based on the presence of Z-axis coordinates.
- Return type:
tuple of int