biapy.data.pre_processingο
Pre-processing utilities for image and mask data in deep learning workflows.
This module provides pre-processing functions for instance segmentation, detection mask creation, self-supervised learning data generation, semantic segmentation probability maps, and general image processing operations such as resizing, blurring, edge detection, histogram matching, and CLAHE. It supports both 2D and 3D data formats and integrates with BiaPy configuration objects for flexible data pipelines.
- biapy.data.pre_processing.create_instance_channels(cfg: CfgNode, data_type: str = 'train')[source]ο
Create training and validation new data with appropiate channels based on
PROBLEM.INSTANCE_SEG.DATA_CHANNELSfor instance segmentation.- Parameters:
cfg (YACS CN object) β Configuration.
data_type (str, optional) β Wheter to create training or validation instance channels.
- biapy.data.pre_processing.unique_labels_fast(a: ndarray)[source]ο
Find the unique labels in an integer array a in [0, K] in O(n) time and O(K) space.
- Parameters:
a (ndarray) β Input array of integers.
- Returns:
Array of unique labels.
- Return type:
ndarray
- biapy.data.pre_processing.labels_into_channels(instance_labels: ndarray[tuple[int, ...], dtype[_ScalarType_co]], mode: List[str] = ['I', 'C'], channel_extra_opts: Dict = {}, resolution: List[float | int] = [1, 1, 1], save_dir: str | None = None) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]ο
Convert input semantic or instance segmentation data masks into different binary channels to train an instance segmentation problem.
- Parameters:
instance_labels (3D/4D Numpy array) β Instance labels to be used to extract the channels from. E.g.
(200, 1000, 1000, 1)mode (List, optional) β
- Operation mode. Possible values:
C,BC,BCM,BCD,BD,BCDv2,Dv2,BDv2andBP. βBβ stands for βBinary segmentationβ, containing each instance region without the contour.
βCβ stands for βContourβ, containing each instance contour.
βDβ stands for βDistanceβ, each pixel containing the distance of it to the center of the object.
βMβ stands for βMaskβ, contains the B and the C channels, i.e. the foreground mask. Is simply achieved by binarizing input instance masks.
βDv2β stands for βDistance V2β, which is an updated version of βDβ channel calculating background distance as well.
βPβ stands for βPointsβ and contains the central points of an instance (as in Detection workflow)
βAβ stands for βAffinitiesβ and contains the affinity values for each dimension
- Operation mode. Possible values:
channel_extra_opts (dict, optional) β Additional options for each output channel (e.g., {βIβ: {βerosionβ: 1}}).
resolution (Tuple of int/float) β Resolution of the data, in
(z,y,x)to calibrate coordinates. E.g.[30,8,8].save_dir (str, optional) β Path to store samples of the created array just to debug it is correct.
- Returns:
new_mask β Instance representations. The shape will be as the input
instance_labelsbut with the amount of channels requested. E.g.(200, 1000, 1000, 3)- Return type:
3D/4D Numpy array
- biapy.data.pre_processing.norm_channel(channel: ndarray[tuple[int, ...], dtype[_ScalarType_co]], vol: ndarray[tuple[int, ...], dtype[_ScalarType_co]], instances: list[int]) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]ο
Normalize a channel based on instance masks.
- Parameters:
channel (NDArray) β The channel to normalize (e.g. db_channel).
vol (NDArray) β Instance mask volume, same shape as channel.
instances (list[int]) β List of instance IDs in vol. Background (0) will be ignored.
- Returns:
Normalized channel, same shape as input.
- Return type:
NDArray
- biapy.data.pre_processing.slice_from_props(props_tbl: DataFrame | dict, i: int, ndim: int) tuple[slice, ...][source]ο
Get a slice representation from the properties table for a specific instance.
- Parameters:
props_tbl (pd.DataFrame | dict) β The properties table containing region properties.
i (int) β The index of the instance in the properties table.
ndim (int) β The number of dimensions (2 or 3).
- Returns:
A tuple of slice objects representing the bounding box of the instance.
- Return type:
tuple[slice, β¦]
- biapy.data.pre_processing.unet_border_weight_map(instances: ndarray, w0: float = 10.0, sigma: float = 5.0, apply_only_background: bool = True, resolution: List[float | int] | None = None) ndarray[source]ο
U-Net border-aware weight map (Ronneberger et al. 2015) for 2D or 3D labels.
- Parameters:
instances (np.ndarray, shape (H, W) or (D, H, W), dtype int) β 0/background for background, 1..N (or any ints != background) are instance ids.
w0 (float) β Border weight magnitude.
sigma (float) β Spatial decay (in same units as resolution).
apply_only_background (bool) β If True, apply the exponential term only on background (as in the paper).
resolution (List[int|float] | None) β Voxel spacing along each axis (z,y,x) or (y,x). If None, isotropic spacing of 1 is assumed.
- Returns:
w β Border weight map.
- Return type:
np.ndarray, same shape as instances, dtype float32
- biapy.data.pre_processing.touching_mask_nd(labels: ndarray[tuple[int, ...], dtype[_ScalarType_co]], connectivity: int = 1) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]ο
Create a binary mask of touching pixels/voxels for an N-D labeled instance mask.
- Parameters:
labels (NDArray) β N-D array of instance labels (0 = background, 1..N = instances).
connectivity (int, optional) β Neighborhood connectivity passed to generate_binary_structure. 1 = 6-neigh for 3D / 4-neigh for 2D, 2 = 18-neigh for 3D / 8-neigh for 2D, 3 = 26-neigh for 3D (if ndim==3).
- Returns:
touch β Binary mask with 1 where a voxel touches at least one different instance.
- Return type:
NDArray
- biapy.data.pre_processing.generate_rays(n_rays: int, ndim: int, jitter: bool = False, seed: int = 0)[source]ο
Unit directions in R^ndim. - 2D: uniform angles on circle -> (R,2) [dx,dy] - 3D: Fibonacci sphere -> (R,3) [dx,dy,dz]
- Parameters:
n_rays (int) β Number of rays to generate.
ndim (int) β Dimensionality (2 or 3).
jitter (bool, optional) β Whether to add jitter to 3D rays (default: False).
seed (int, optional) β Random seed for jitter (default: 0).
- Returns:
rays β Unit vectors along which to compute distances.
- Return type:
(n_rays, 2) or (n_rays, 3) Numpy array
- biapy.data.pre_processing.radial_distances(labels: ndarray[tuple[int, ...], dtype[_ScalarType_co]], rays: ndarray[tuple[int, ...], dtype[_ScalarType_co]], max_dist: float | None = None, spacing: Sequence[float] | None = None, max_iters: int = 50) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]ο
Compute radial distances from each foreground pixel to the instance boundary along specified rays.
- Parameters:
labels (NDArray) β 2D or 3D array of instance labels (0 = background, 1..N = instances).
rays ((n_rays, 2) or (n_rays, 3) Numpy array) β Unit vectors along which to compute distances.
max_dist (float, optional) β Maximum distance to cap at. If None, no capping is done.
spacing (sequence of float, optional) β Physical spacing of the data in each dimension. If None, assumes isotropic spacing of 1.0.
max_iters (int) β Maximum number of steps to march along each ray.
- Returns:
D β Array of shape (H, W, n_rays) or (D, H, W, n_rays) with distances in physical units. Background pixels have distance 0 in all rays.
- Return type:
NDArray
- biapy.data.pre_processing.euler_integration(flow: ndarray[tuple[int, ...], dtype[_ScalarType_co]], coords: ndarray[tuple[int, ...], dtype[_ScalarType_co]], n_steps: int = 200, dt: float = 1.0, suppressed: bool = True)[source]ο
Euler integration of flow field starting at coords.
- Parameters:
flow ((2, H, W) or (3, D, H, W) Numpy array) β Flow field (y,x) or (z,y,x).
coords ((N, 2) or (N, 3) Numpy array) β Starting coordinates (y,x) or (z,y,x) in index space.
n_steps (int) β Number of integration steps.
dt (float) β Integration step size.
suppressed (bool) β Whether to use time-suppressed integration (dt/(t+1)) or not (constant dt).
- Returns:
pos β Final positions after integration.
- Return type:
(N, 2) or (N, 3) Numpy array
- biapy.data.pre_processing.synapse_channel_creation(data_info: Dict, zarr_data_information: Dict, savepath: str, mode: List[str] = ['F_pre', 'F_post'], channel_extra_opts: Dict[str, Dict] = {}, verbose: bool = False)[source]ο
Create different channels that represent a synapse segmentation problem to train an instance segmentation problem.
This function is only prepared to read an H5/Zarr file that follows CREMI data format.
- Parameters:
data_info (dict) β All patches that can be extracted from all the Zarr/H5 samples in
data_path. Keys created are:"filepath": path to the file where the patch was extracted."full_shape": shape of the data within the file where the patch was extracted."patch_coords": coordinates of the data that represents the patch.
zarr_data_information (dict) β Information when using Zarr/H5 files. Assumes that the H5/Zarr files contain the information according CREMI data format. The following keys are expected:
"raw_data_path": path within the file where the raw data is stored. Reference in CREMI:volumes/raw"axes_order": order of the axes in the file. E.g. βZYXβ or βZCYXβ."z_axe_pos": position of z axis of the data within the file."y_axe_pos": position of y axis of the data within the file."x_axe_pos": position of x axis of the data within the file."id_path": path within the file where theidsare stored. Reference in CREMI:annotations/ids"partners_path": path within the file wherepartnersis stored. Reference in CREMI:annotations/partners"locations_path": path within the file wherelocationsis stored. Reference in CREMI:annotations/locations"resolution_path": path within the file whereresolutionis stored. Reference in CREMI:["volumes/raw"].attrs["offset"]
savepath (str) β Path to save the data created.
mode (List, optional) β Operation mode.
channel_extra_opts (dict, optional) β Extra options for specific channels. For example, dilation for the βF_preβ and βF_postβ channels. Expected keys are:
"F_pre": options for the βF_preβ channel. Expected keys are:"dilation": list of 3 ints specifying the dilation in z,y,x for the βF_preβ channel (default: [1,10,10]).
"F_post": options for the βF_postβ channel. Expected keys are:"dilation": list of 3 ints specifying the dilation in z,y,x for the βF_postβ channel (default: [1,10,10]).
"H","V","Z": options for the distance channels. Expected keys are:"norm": whether to normalize the distance channels per instance (default: True).
verbose (bool, optional) β Whether to print warnings about out-of-bounds synaptic points (default: False).
- Returns:
new_mask (5D Numpy array) β 5D array with 3 channels instead of one. E.g.
(10, 200, 1000, 1000, 3)patch_offset (list of list) β Pixels used on each axis to pad the patch in order to not cut some of the values in the edges.
- biapy.data.pre_processing.create_HoVe_channels(data: ndarray[tuple[int, ...], dtype[_ScalarType_co]], ref_point: str = 'center', label_to_pre_site: Dict | None = None, normalize_values: bool = True, calc_props: Dict | None = None, axis_order: str = 'ZYX', resolution: List[float | int] = [1, 1, 1])[source]ο
Obtain the horizontal and vertical distance maps for each instance.
Depth distance is also calculated if the
dataprovided is 3D.- Parameters:
data (2D/3D Numpy array) β Instance mask to create horizontal/vertical/depth channels from. E.g.
(500, 500)for 2D and(200, 1000, 1000)for 3D.ref_point (str, optional) β Reference point to be used to create the horizontal/vertical/depth channels. Possible values:
center,presynaptic. Details:βcenterβ: point to the centroid.
βpresynapticβ: point to the presynaptic site. To use this
label_to_pre_sitemust be provided.
label_to_pre_site (dict, optional) β Reference of the presynaptic site for each label within the provided volume (
data).normalize_values (bool, optional) β Whether to normalize the values or not.
calc_props (dict, optional) β If region properties have already been calculated, they can be provided here to avoid recalculation.
resolution (list of int or float, optional) β Physical resolution of the data in each dimension. Used to scale the horizontal/vertical/depth values to physical units if provided. Default is [1,1,1] (isotropic).
- Returns:
new_mask β Horizontal/vertical/depth channels. E.g.
(500, 500, 2)for 2D and(200, 1000, 1000, 3)for 3D.- Return type:
3D/4D Numpy array
- biapy.data.pre_processing.generate_ellipse_footprint(shape=[1, 1, 1]) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]ο
Generate footprint of an ellipse in a n-dimensional image.
- Parameters:
shape (list, optional) β Shape of the hyperball with the given side lengths.
- Returns:
distances β Ellipse footprint.
- Return type:
NDArray
- biapy.data.pre_processing.create_detection_masks(cfg: CfgNode, data_type: str = 'train')[source]ο
Create detection masks based on CSV files.
- Parameters:
cfg (YACS CN object) β Configuration.
data_type (str, optional) β Wheter to create train, validation or test masks.
- biapy.data.pre_processing.create_ssl_source_data_masks(cfg: CfgNode, data_type: str = 'train')[source]ο
Create SSL source data.
- Parameters:
cfg (YACS CN object) β Configuration.
- data_type: str, optional
Wheter to create train, validation or test source data.
- biapy.data.pre_processing.crappify(input_img: ndarray[tuple[int, ...], dtype[_ScalarType_co]], resizing_factor: float, add_noise: bool = True, noise_level: float | None = None, Down_up: bool = True)[source]ο
Crappify input image by adding Gaussian noise and downsampling and upsampling it so the resolution gets worsen.
- input_img4D/5D Numpy array
Data to be modified. E.g.
(y, x, channels)if working with 2D images or(z, y, x, channels)if working with 3D.- resizing_factorfloats
Downsizing factor to reshape the image.
- add_noiseboolean, optional
Indicating whether to add gaussian noise before applying the resizing.
- noise_level: float, optional
Number between
[0,1]indicating the std of the Gaussian noise N(0,std).- Down_upbool, optional
Indicating whether to perform a final upsampling operation to obtain an image of the same size as the original but with the corresponding loss of quality of downsizing and upsizing.
- Returns:
img β Train images. E.g.
(y, x, channels)if working with 2D images or(z, y, x, channels)if working with 3D.- Return type:
4D/5D Numpy array
- biapy.data.pre_processing.add_gaussian_noise(image: ndarray[tuple[int, ...], dtype[_ScalarType_co]], percentage_of_noise: float) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]ο
Add Gaussian noise to an input image.
- Parameters:
image (3D Numpy array) β Image to be added Gaussian Noise with 0 mean and a certain std. E.g.
(y, x, channels).percentage_of_noise (float) β percentage of the maximum value of the image that will be used as the std of the Gaussian Noise distribution.
- Returns:
out β Transformed image. E.g.
(y, x, channels).- Return type:
3D Numpy array
- biapy.data.pre_processing.calculate_volume_prob_map(Y: BiaPyDataset, is_3d: bool = False, w_foreground: float = 0.94, w_background: float = 0.06, save_dir=None) List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]] | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]ο
Calculate the probability map of the given data.
- Parameters:
Y (list of dict) β Data to calculate the probability map from. Each item in the list represents a sample of the dataset. Expected keys:
"filename": name of the image to extract the data sample from."dir": directory where the image resides."img": image sample itself. It is a ndarrray of(y, x, channels)in2Dand(z, y, x, channels)``in ``3D. Provided if the user selected to load data into memory.
If
"img"is provided"filename"and"filename"are not necessary, and vice versa.w_foreground (float, optional) β Weight of the foreground. This value plus
w_backgroundmust be equal1.w_background (float, optional) β Weight of the background. This value plus
w_foregroundmust be equal1.save_dir (str, optional) β Path to the file where the probability map will be stored.
- Returns:
maps β Probability map(s) of all samples in
Y.sample_list.- Return type:
NDArray or list of NDArray
- biapy.data.pre_processing.resize_images(images: List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]], **kwards) List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]ο
Resize all the images using the specified parameters or default values if not provided.
- Parameters:
images (list of Numpy arrays) β The images parameter is the list of all input images that you want to resize.
output_shape (iterable) β Size of the generated output image. E.g. (256,256)
(kwards) (optional) β skimage.transform.resize() parameters are also allowed.
- Returns:
resized_images β The resized images. The returned data will use the same data type as the given images.
- Return type:
list of Numpy arrays
- biapy.data.pre_processing.apply_gaussian_blur(images: List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]], **kwards) List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]ο
Apply a Gaussian blur to all images.
- Parameters:
images (list of Numpy arrays) β The input images on which the Gaussian blur will be applied.
(kwards) (optional) β skimage.filters.gaussian() parameters are also allowed.
- Returns:
blurred_images β A Gaussian blurred images. The returned data will use the same data type as the given images.
- Return type:
list of Numpy arrays
- biapy.data.pre_processing.apply_median_blur(images: List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]], **kwards) List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]ο
Apply a median blur filter to all images.
- Parameters:
image (list of Numpy arrays) β The input image on which the median blur operation will be applied.
(kwards) (optional) β skimage.filters.median() parameters are also allowed.
- Returns:
blurred_images β The median-blurred images. The returned data will use the same data type as the given images.
- Return type:
list of Numpy arrays
- biapy.data.pre_processing.detect_edges(images: List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]], **kwards) List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]ο
Detect edges in the given images using the Canny edge detection algorithm.
The function detect_edges takes the 2D images as input, converts it to grayscale if necessary, and applies the Canny edge detection algorithm to detect edges in the image.
- Parameters:
images (list of Numpy arrays) β The list of all input images on which the edge detection will be performed. It can be either a color image with shape (height, width, 3) or a grayscale image with shape (height, width, 1).
(kwards) (optional) β skimage.feature.canny() parameters are also allowed.
- Returns:
edges β The edges of the input images. The returned Numpy arrays will be uint8, where background is black (0) and edges white (255). The returned data will use the same structure as the given images (list[Numpy array] or Numpy array).
- Return type:
list of Numpy arrays
- biapy.data.pre_processing.apply_histogram_matching(images: List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]], reference_path: str, is_2d: bool)[source]ο
Apply histogram matching to a list of images based on the histogram of reference images.
The function returns the images with their histogram matched to the histogram of the reference images, loaded from the given
reference_path.- Parameters:
images (list of Numpy arrays) β The list of input images whose histogram needs to be matched to the reference histogram. It should be a Numpy array representing the image.
reference_path (str) β The reference_path is the directory path to the reference images. From reference images, we will extract the reference histogram with which we want to match the histogram of the images. It represents the desired distribution of pixel intensities in the output image.
is_2d (bool, optional) β The value indicate if the data given in
reference_pathis 2D (is_2d = True) or 3D (is_2d = False). Defaults to True.
- Returns:
matched_images β The result of matching the histogram of the input images to the histogram of the reference image. The returned data will use the same data type as the given images.
- Return type:
list of Numpy arrays
- biapy.data.pre_processing.apply_clahe(images: List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]], **kwards) List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]ο
Apply Contrast Limited Adaptive Histogram Equalization (CLAHE) to a list of images.
The function applies Contrast Limited Adaptive Histogram Equalization (CLAHE) to an image and returns the result.
- Parameters:
images (list of Numpy arrays) β The list of input images that you want to apply the CLAHE (Contrast Limited Adaptive Histogram Equalization) algorithm to.
(kwards) (optional) β skimage.exposure.equalize_adapthist() parameters are also allowed.
- Returns:
processed_images β The images after applying the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm. The returned data will use the same data type as the given images.
- Return type:
list of Numpy arrays
- biapy.data.pre_processing.preprocess_data(cfg: CfgNode, x_data: List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]] = [], y_data: List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]] = [], is_2d: bool = True, is_y_mask: bool = False) List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]] | Tuple[List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]], List[ndarray[tuple[int, ...], dtype[_ScalarType_co]]]][source]ο
Pre-process data by applying various image processing techniques.
- Parameters:
cfg (dict) β The cfg parameter is a configuration object that contains various settings for preprocessing the data. It is used to control the behavior of different preprocessing techniques such as image resizing, blurring, histogram matching, etc.
x_data (list of 3D/4D Numpy arrays, optional) β The input data (images) to be preprocessed. The first dimension must be the number of images. E.g.
(num_of_images, y, x, channels)or(num_of_images, z, y, x, channels). In case of using a list, the format of the images remains the same. Each item in the list corresponds to a different image.y_data (list of 3D/4D Numpy arrays, optional) β The target data that corresponds to the x_data. The first dimension must be the number of images. E.g.
(num_of_images, y, x, channels)or(num_of_images, z, y, x, channels). In case of using a list, the format of the images remains the same. Each item in the list corresponds to a different image.is_2d (bool, optional) β A boolean flag indicating whether the reference data for histogram matching is 2D or not. Defaults to True.
is_y_mask (bool, optional) β is_y_mask is a boolean parameter that indicates whether the y_data is a mask or not. If it is set to True, the resize operation for y_data will use the nearest neighbor interpolation method (order=0), otherwise it will use the interpolation method specified in the cfg.RESIZE.ORDER parameter. Defaults to False.
- Returns:
x_data (list of 3D/4D Numpy arrays, optional) β Preprocessed data. The same structure and dimensionality of the given data will be returned.
y_data (list of 3D/4D Numpy arrays, optional) β Preprocessed data. The same structure and dimensionality of the given data will be returned.