2D Data manipulation

biapy.data.data_2D_manipulation.load_and_prepare_2D_train_data(train_path, train_mask_path, cross_val=False, cross_val_nsplits=5, cross_val_fold=1, val_split=0.1, seed=0, shuffle_val=True, num_crops_per_dataset=0, random_crops_in_DA=False, crop_shape=None, y_upscaling=(1, 1), ov=(0, 0), padding=(0, 0), minimum_foreground_perc=-1, reflect_to_complete_shape=False, convert_to_rgb=False, preprocess_cfg=None, is_y_mask=False, preprocess_f=None)[source]

Load train and validation images from the given paths to create 2D data.

Parameters:

train_path (str) – Path to the training data.
train_mask_path (str) – Path to the training data masks.
cross_val (bool, optional) – Whether to use cross validation or not.
cross_val_nsplits (int, optional) – Number of folds for the cross validation.
cross_val_fold (int, optional) – Number of the fold to be used as validation.
val_split (float, optional) – % of the train data used as validation (value between 0 and 1).
seed (int, optional) – Seed value.
shuffle_val (bool, optional) – Take random training examples to create validation data.
num_crops_per_dataset (int, optional) – Number of crops per extra dataset to take into account. Useful to ensure that all the datasets have the same weight during network trainning.
random_crops_in_DA (bool, optional) – To advice the method that not preparation of the data must be done, as random subvolumes will be created on DA, and the whole volume will be used for that.
crop_shape (3D int tuple, optional) – Shape of the crops. E.g. (y, x, channels).
y_upscaling (2 int tuple, optional) – Upscaling to be done when loading Y data. User for super-resolution workflow.
ov (2 floats tuple, optional) – Amount of minimum overlap on x and y dimensions. The values must be on range [0, 1), that is, 0% or 99% of overlap. E.g. (y, x).
padding (tuple of ints, optional) – Size of padding to be added on each axis (y, x). E.g. (24, 24)
minimum_foreground_perc (float, optional) – Minimum percetnage of foreground that a sample need to have no not be discarded.
reflect_to_complete_shape (bool, optional) – Wheter to increase the shape of the dimension that have less size than selected patch size padding it with ‘reflect’.
convert_to_rgb (bool, optional) – In case RGB images are expected, e.g. if crop_shape channel is 3, those images that are grayscale are converted into RGB.
preprocess_cfg (dict, optional) – Configuration parameters for preprocessing, is necessary in case you want to apply any preprocessing.
is_y_mask (bool, optional) – Whether the data are masks. It is used to control the preprocessing of the data.
preprocess_f (function, optional) – The preprocessing function, is necessary in case you want to apply any preprocessing.

Returns:

X_train (4D Numpy array) – Train images. E.g. (num_of_images, y, x, channels).
Y_train (4D Numpy array) – Train images’ mask. E.g. (num_of_images, y, x, channels).
X_val (4D Numpy array, optional) – Validation images (val_split > 0). E.g. (num_of_images, y, x, channels).
Y_val (4D Numpy array, optional) – Validation images’ mask (val_split > 0). E.g. (num_of_images, y, x, channels).
filenames (List of str) – Loaded train filenames.
val_index (List of ints) – Indexes of the samples beloging to the validation.

Examples

# EXAMPLE 1
# Case where we need to load the data (creating a validation split)
train_path = "data/train/x"
train_mask_path = "data/train/y"

# Original image shape is (1024, 768, 165), so each image shape should be this:
img_train_shape = (1024, 768, 1)

X_train, Y_train, X_val,
Y_val, crops_made = load_and_prepare_2D_data(train_path, train_mask_path, img_train_shape, val_split=0.1,
    shuffle_val=True, make_crops=False)

# The function will print the shapes of the generated arrays. In this example:
#     *** Loaded train data shape is: (148, 768, 1024, 1)
#     *** Loaded validation data shape is: (17, 768, 1024, 1)
#
# Notice height and width swap because of Numpy ndarray terminology

# EXAMPLE 2
# Same as the first example but creating patches of (256x256)
X_train, Y_train, X_val,
Y_val, crops_made = load_and_prepare_2D_data(train_path, train_mask_path, img_train_shape, val_split=0.1,
    shuffle_val=True, make_crops=True, crop_shape=(256, 256, 1))

# The function will print the shapes of the generated arrays. In this example:
#    *** Loaded train data shape is: (1776, 256, 256, 1)
#    *** Loaded validation data shape is: (204, 256, 256, 1)

biapy.data.data_2D_manipulation.crop_data_with_overlap(data, crop_shape, data_mask=None, overlap=(0, 0), padding=(0, 0), verbose=True)[source]

Crop data into small square pieces with overlap. The difference with crop_data() is that this function allows you to create patches with overlap.

The opposite function is merge_data_with_overlap().

Parameters:

data (4D Numpy array) – Data to crop. E.g. (num_of_images, y, x, channels).
crop_shape (3 int tuple) – Shape of the crops to create. E.g. (y, x, channels).
data_mask (4D Numpy array, optional) – Data mask to crop. E.g. (num_of_images, y, x, channels).
overlap (Tuple of 2 floats, optional) – Amount of minimum overlap on x and y dimensions. The values must be on range [0, 1), that is, 0% or 99% of overlap. E. g. (y, x).
padding (tuple of ints, optional) – Size of padding to be added on each axis (y, x). E.g. (24, 24).
verbose (bool, optional) – To print information about the crop to be made.

Returns:

cropped_data (4D Numpy array) – Cropped image data. E.g. (num_of_images, y, x, channels).
cropped_data_mask (4D Numpy array, optional) – Cropped image data masks. E.g. (num_of_images, y, x, channels).

Examples

# EXAMPLE 1
# Divide in crops of (256, 256) a given data with the minimum overlap
X_train = np.ones((165, 768, 1024, 1))
Y_train = np.ones((165, 768, 1024, 1))

X_train, Y_train = crop_data_with_overlap(X_train, (256, 256, 1), Y_train, (0, 0))

# Notice that as the shape of the data has exact division with the wnanted crops shape so no overlap will be
# made. The function will print the following information:
#     Minimum overlap selected: (0, 0)
#     Real overlapping (%): (0.0, 0.0)
#     Real overlapping (pixels): (0.0, 0.0)
#     (3, 4) patches per (x,y) axis
#     **** New data shape is: (1980, 256, 256, 1)

# EXAMPLE 2
# Same as example 1 but with 25% of overlap between crops
X_train, Y_train = crop_data_with_overlap(X_train, (256, 256, 1), Y_train, (0.25, 0.25))

# The function will print the following information:
#     Minimum overlap selected: (0.25, 0.25)
#     Real overlapping (%): (0.33203125, 0.3984375)
#     Real overlapping (pixels): (85.0, 102.0)
#     (4, 6) patches per (x,y) axis
#     **** New data shape is: (3960, 256, 256, 1)

# EXAMPLE 3
# Same as example 1 but with 50% of overlap between crops
X_train, Y_train = crop_data_with_overlap(X_train, (256, 256, 1), Y_train, (0.5, 0.5))

# The function will print the shape of the created array. In this example:
#     Minimum overlap selected: (0.5, 0.5)
#     Real overlapping (%): (0.59765625, 0.5703125)
#     Real overlapping (pixels): (153.0, 146.0)
#     (6, 8) patches per (x,y) axis
#     **** New data shape is: (7920, 256, 256, 1)

# EXAMPLE 4
# Same as example 2 but with 50% of overlap only in x axis
X_train, Y_train = crop_data_with_overlap(X_train, (256, 256, 1), Y_train, (0.5, 0))

# The function will print the shape of the created array. In this example:
#     Minimum overlap selected: (0.5, 0)
#     Real overlapping (%): (0.59765625, 0.0)
#     Real overlapping (pixels): (153.0, 0.0)
#     (6, 4) patches per (x,y) axis
#     **** New data shape is: (3960, 256, 256, 1)

biapy.data.data_2D_manipulation.merge_data_with_overlap(data, original_shape, data_mask=None, overlap=(0, 0), padding=(0, 0), verbose=True, out_dir=None, prefix='')[source]

Merge data with an amount of overlap.

The opposite function is crop_data_with_overlap().

Parameters:

data (4D Numpy array) – Data to merge. E.g. (num_of_images, y, x, channels).
original_shape (4D int tuple) – Shape of the original data. E.g. (num_of_images, y, x, channels)
data_mask (4D Numpy array, optional) – Data mask to merge. E.g. (num_of_images, y, x, channels).
overlap (Tuple of 2 floats, optional) – Amount of minimum overlap on x and y dimensions. Should be the same as used in crop_data_with_overlap(). The values must be on range [0, 1), that is, 0% or 99% of overlap. E. g. (y, x).
padding (tuple of ints, optional) – Size of padding to be added on each axis (y, x). E.g. (24, 24).
verbose (bool, optional) – To print information about the crop to be made.
out_dir (str, optional) – If provided an image that represents the overlap made will be saved. The image will be colored as follows: green region when ==2 crops overlap, yellow when 2 < x < 6 and red when =<6 or more crops are merged.
prefix (str, optional) – Prefix to save overlap map with.

Returns:

merged_data (4D Numpy array) – Merged image data. E.g. (num_of_images, y, x, channels).
merged_data_mask (4D Numpy array, optional) – Merged image data mask. E.g. (num_of_images, y, x, channels).

Examples

# EXAMPLE 1
# Merge the data of example 1 of 'crop_data_with_overlap' function

# 1) CROP
X_train = np.ones((165, 768, 1024, 1))
Y_train = np.ones((165, 768, 1024, 1))
X_train, Y_train = crop_data_with_overlap(X_train, (256, 256, 1), Y_train, (0, 0))

# 2) MERGE
X_train, Y_train = merge_data_with_overlap(
    X_train, (165, 768, 1024, 1), Y_train, (0, 0), out_dir='out_dir')

# The function will print the following information:
#     Minimum overlap selected: (0, 0)
#     Real overlapping (%): (0.0, 0.0)
#     Real overlapping (pixels): (0.0, 0.0)
#     (3, 4) patches per (x,y) axis
#     **** New data shape is: (165, 768, 1024, 1)

# EXAMPLE 2
# Merge the data of example 2 of 'crop_data_with_overlap' function
X_train, Y_train = merge_data_with_overlap(
     X_train, (165, 768, 1024, 1), Y_train, (0.25, 0.25), out_dir='out_dir')

# The function will print the following information:
#     Minimum overlap selected: (0.25, 0.25)
#     Real overlapping (%): (0.33203125, 0.3984375)
#     Real overlapping (pixels): (85.0, 102.0)
#     (3, 5) patches per (x,y) axis
#     **** New data shape is: (165, 768, 1024, 1)

# EXAMPLE 3
# Merge the data of example 3 of 'crop_data_with_overlap' function
X_train, Y_train = merge_data_with_overlap(
    X_train, (165, 768, 1024, 1), Y_train, (0.5, 0.5), out_dir='out_dir')

# The function will print the shape of the created array. In this example:
#     Minimum overlap selected: (0.5, 0.5)
#     Real overlapping (%): (0.59765625, 0.5703125)
#     Real overlapping (pixels): (153.0, 146.0)
#     (6, 8) patches per (x,y) axis
#     **** New data shape is: (165, 768, 1024, 1)

# EXAMPLE 4
# Merge the data of example 1 of 'crop_data_with_overlap' function
X_train, Y_train = merge_data_with_overlap(
    X_train, (165, 768, 1024, 1), Y_train, (0.5, 0), out_dir='out_dir')

# The function will print the shape of the created array. In this example:
#     Minimum overlap selected: (0.5, 0)
#     Real overlapping (%): (0.59765625, 0.0)
#     Real overlapping (pixels): (153.0, 0.0)
#     (6, 4) patches per (x,y) axis
#     **** New data shape is: (165, 768, 1024, 1)

As example of different overlap maps are presented below.

Example 1 overlapping map	Example 2 overlapping map
Example 3 overlapping map	Example 4 overlapping map

biapy.data.data_2D_manipulation.load_data_classification(data_dir, patch_shape, convert_to_rgb=True, expected_classes=None, cross_val=False, cross_val_nsplits=5, cross_val_fold=1, val_split=0.1, seed=0, shuffle_val=True)[source]

Load data to train classification methods.

Parameters:

data_dir (str) – Path to the training data.
patch_shape (Tuple of ints) – Shape of the patch. E.g. (y, x, channels).
convert_to_rgb (bool, optional) – In case RGB images are expected, e.g. if crop_shape channel is 3, those images that are grayscale are converted into RGB.
expected_classes (int, optional) – Expected number of classes to be loaded.
cross_val (bool, optional) – Whether to use cross validation or not.
cross_val_nsplits (int, optional) – Number of folds for the cross validation.
cross_val_fold (int, optional) – Number of the fold to be used as validation.
val_split (float, optional) – % of the train data used as validation (value between 0 and 1).
seed (int, optional) – Seed value.
shuffle_val (bool, optional) – Take random training examples to create validation data.

Returns:

X_data (4D Numpy array) – Train images. E.g. (num_of_images, y, x, channels).
Y_data (4D Numpy array) – Train images’ mask. E.g. (num_of_images, y, x, channels).
X_val (4D Numpy array, optional) – Validation images (val_split > 0). E.g. (num_of_images, y, x, channels).
Y_val (4D Numpy array, optional) – Validation images’ mask (val_split > 0). E.g. (num_of_images, y, x, channels).
all_ids (List of str) – Loaded data filenames.
val_index (List of ints) – Indexes of the samples beloging to the validation.