2D Data manipulation
- biapy.data.data_2D_manipulation.load_and_prepare_2D_train_data(train_path, train_mask_path, cross_val=False, cross_val_nsplits=5, cross_val_fold=1, val_split=0.1, seed=0, shuffle_val=True, num_crops_per_dataset=0, random_crops_in_DA=False, crop_shape=None, y_upscaling=(1, 1), ov=(0, 0), padding=(0, 0), minimum_foreground_perc=-1, reflect_to_complete_shape=False, convert_to_rgb=False, preprocess_cfg=None, is_y_mask=False, preprocess_f=None)[source]
Load train and validation images from the given paths to create 2D data.
- Parameters:
train_path (str) – Path to the training data.
train_mask_path (str) – Path to the training data masks.
cross_val (bool, optional) – Whether to use cross validation or not.
cross_val_nsplits (int, optional) – Number of folds for the cross validation.
cross_val_fold (int, optional) – Number of the fold to be used as validation.
val_split (float, optional) – % of the train data used as validation (value between
0
and1
).seed (int, optional) – Seed value.
shuffle_val (bool, optional) – Take random training examples to create validation data.
num_crops_per_dataset (int, optional) – Number of crops per extra dataset to take into account. Useful to ensure that all the datasets have the same weight during network trainning.
random_crops_in_DA (bool, optional) – To advice the method that not preparation of the data must be done, as random subvolumes will be created on DA, and the whole volume will be used for that.
crop_shape (3D int tuple, optional) – Shape of the crops. E.g.
(y, x, channels)
.y_upscaling (2 int tuple, optional) – Upscaling to be done when loading Y data. User for super-resolution workflow.
ov (2 floats tuple, optional) – Amount of minimum overlap on x and y dimensions. The values must be on range
[0, 1)
, that is,0%
or99%
of overlap. E.g.(y, x)
.padding (tuple of ints, optional) – Size of padding to be added on each axis
(y, x)
. E.g.(24, 24)
minimum_foreground_perc (float, optional) – Minimum percetnage of foreground that a sample need to have no not be discarded.
reflect_to_complete_shape (bool, optional) – Wheter to increase the shape of the dimension that have less size than selected patch size padding it with ‘reflect’.
convert_to_rgb (bool, optional) – In case RGB images are expected, e.g. if
crop_shape
channel is 3, those images that are grayscale are converted into RGB.preprocess_cfg (dict, optional) – Configuration parameters for preprocessing, is necessary in case you want to apply any preprocessing.
is_y_mask (bool, optional) – Whether the data are masks. It is used to control the preprocessing of the data.
preprocess_f (function, optional) – The preprocessing function, is necessary in case you want to apply any preprocessing.
- Returns:
X_train (4D Numpy array) – Train images. E.g.
(num_of_images, y, x, channels)
.Y_train (4D Numpy array) – Train images’ mask. E.g.
(num_of_images, y, x, channels)
.X_val (4D Numpy array, optional) – Validation images (
val_split > 0
). E.g.(num_of_images, y, x, channels)
.Y_val (4D Numpy array, optional) – Validation images’ mask (
val_split > 0
). E.g.(num_of_images, y, x, channels)
.filenames (List of str) – Loaded train filenames.
val_index (List of ints) – Indexes of the samples beloging to the validation.
Examples
# EXAMPLE 1 # Case where we need to load the data (creating a validation split) train_path = "data/train/x" train_mask_path = "data/train/y" # Original image shape is (1024, 768, 165), so each image shape should be this: img_train_shape = (1024, 768, 1) X_train, Y_train, X_val, Y_val, crops_made = load_and_prepare_2D_data(train_path, train_mask_path, img_train_shape, val_split=0.1, shuffle_val=True, make_crops=False) # The function will print the shapes of the generated arrays. In this example: # *** Loaded train data shape is: (148, 768, 1024, 1) # *** Loaded validation data shape is: (17, 768, 1024, 1) # # Notice height and width swap because of Numpy ndarray terminology # EXAMPLE 2 # Same as the first example but creating patches of (256x256) X_train, Y_train, X_val, Y_val, crops_made = load_and_prepare_2D_data(train_path, train_mask_path, img_train_shape, val_split=0.1, shuffle_val=True, make_crops=True, crop_shape=(256, 256, 1)) # The function will print the shapes of the generated arrays. In this example: # *** Loaded train data shape is: (1776, 256, 256, 1) # *** Loaded validation data shape is: (204, 256, 256, 1)
- biapy.data.data_2D_manipulation.crop_data_with_overlap(data, crop_shape, data_mask=None, overlap=(0, 0), padding=(0, 0), verbose=True)[source]
Crop data into small square pieces with overlap. The difference with
crop_data()
is that this function allows you to create patches with overlap.The opposite function is
merge_data_with_overlap()
.- Parameters:
data (4D Numpy array) – Data to crop. E.g.
(num_of_images, y, x, channels)
.crop_shape (3 int tuple) – Shape of the crops to create. E.g.
(y, x, channels)
.data_mask (4D Numpy array, optional) – Data mask to crop. E.g.
(num_of_images, y, x, channels)
.overlap (Tuple of 2 floats, optional) – Amount of minimum overlap on x and y dimensions. The values must be on range
[0, 1)
, that is,0%
or99%
of overlap. E. g.(y, x)
.padding (tuple of ints, optional) – Size of padding to be added on each axis
(y, x)
. E.g.(24, 24)
.verbose (bool, optional) – To print information about the crop to be made.
- Returns:
cropped_data (4D Numpy array) – Cropped image data. E.g.
(num_of_images, y, x, channels)
.cropped_data_mask (4D Numpy array, optional) – Cropped image data masks. E.g.
(num_of_images, y, x, channels)
.
Examples
# EXAMPLE 1 # Divide in crops of (256, 256) a given data with the minimum overlap X_train = np.ones((165, 768, 1024, 1)) Y_train = np.ones((165, 768, 1024, 1)) X_train, Y_train = crop_data_with_overlap(X_train, (256, 256, 1), Y_train, (0, 0)) # Notice that as the shape of the data has exact division with the wnanted crops shape so no overlap will be # made. The function will print the following information: # Minimum overlap selected: (0, 0) # Real overlapping (%): (0.0, 0.0) # Real overlapping (pixels): (0.0, 0.0) # (3, 4) patches per (x,y) axis # **** New data shape is: (1980, 256, 256, 1) # EXAMPLE 2 # Same as example 1 but with 25% of overlap between crops X_train, Y_train = crop_data_with_overlap(X_train, (256, 256, 1), Y_train, (0.25, 0.25)) # The function will print the following information: # Minimum overlap selected: (0.25, 0.25) # Real overlapping (%): (0.33203125, 0.3984375) # Real overlapping (pixels): (85.0, 102.0) # (4, 6) patches per (x,y) axis # **** New data shape is: (3960, 256, 256, 1) # EXAMPLE 3 # Same as example 1 but with 50% of overlap between crops X_train, Y_train = crop_data_with_overlap(X_train, (256, 256, 1), Y_train, (0.5, 0.5)) # The function will print the shape of the created array. In this example: # Minimum overlap selected: (0.5, 0.5) # Real overlapping (%): (0.59765625, 0.5703125) # Real overlapping (pixels): (153.0, 146.0) # (6, 8) patches per (x,y) axis # **** New data shape is: (7920, 256, 256, 1) # EXAMPLE 4 # Same as example 2 but with 50% of overlap only in x axis X_train, Y_train = crop_data_with_overlap(X_train, (256, 256, 1), Y_train, (0.5, 0)) # The function will print the shape of the created array. In this example: # Minimum overlap selected: (0.5, 0) # Real overlapping (%): (0.59765625, 0.0) # Real overlapping (pixels): (153.0, 0.0) # (6, 4) patches per (x,y) axis # **** New data shape is: (3960, 256, 256, 1)
- biapy.data.data_2D_manipulation.merge_data_with_overlap(data, original_shape, data_mask=None, overlap=(0, 0), padding=(0, 0), verbose=True, out_dir=None, prefix='')[source]
Merge data with an amount of overlap.
The opposite function is
crop_data_with_overlap()
.- Parameters:
data (4D Numpy array) – Data to merge. E.g.
(num_of_images, y, x, channels)
.original_shape (4D int tuple) – Shape of the original data. E.g.
(num_of_images, y, x, channels)
data_mask (4D Numpy array, optional) – Data mask to merge. E.g.
(num_of_images, y, x, channels)
.overlap (Tuple of 2 floats, optional) – Amount of minimum overlap on x and y dimensions. Should be the same as used in
crop_data_with_overlap()
. The values must be on range[0, 1)
, that is,0%
or99%
of overlap. E. g.(y, x)
.padding (tuple of ints, optional) – Size of padding to be added on each axis
(y, x)
. E.g.(24, 24)
.verbose (bool, optional) – To print information about the crop to be made.
out_dir (str, optional) – If provided an image that represents the overlap made will be saved. The image will be colored as follows: green region when
==2
crops overlap, yellow when2 < x < 6
and red when=<6
or more crops are merged.prefix (str, optional) – Prefix to save overlap map with.
- Returns:
merged_data (4D Numpy array) – Merged image data. E.g.
(num_of_images, y, x, channels)
.merged_data_mask (4D Numpy array, optional) – Merged image data mask. E.g.
(num_of_images, y, x, channels)
.
Examples
# EXAMPLE 1 # Merge the data of example 1 of 'crop_data_with_overlap' function # 1) CROP X_train = np.ones((165, 768, 1024, 1)) Y_train = np.ones((165, 768, 1024, 1)) X_train, Y_train = crop_data_with_overlap(X_train, (256, 256, 1), Y_train, (0, 0)) # 2) MERGE X_train, Y_train = merge_data_with_overlap( X_train, (165, 768, 1024, 1), Y_train, (0, 0), out_dir='out_dir') # The function will print the following information: # Minimum overlap selected: (0, 0) # Real overlapping (%): (0.0, 0.0) # Real overlapping (pixels): (0.0, 0.0) # (3, 4) patches per (x,y) axis # **** New data shape is: (165, 768, 1024, 1) # EXAMPLE 2 # Merge the data of example 2 of 'crop_data_with_overlap' function X_train, Y_train = merge_data_with_overlap( X_train, (165, 768, 1024, 1), Y_train, (0.25, 0.25), out_dir='out_dir') # The function will print the following information: # Minimum overlap selected: (0.25, 0.25) # Real overlapping (%): (0.33203125, 0.3984375) # Real overlapping (pixels): (85.0, 102.0) # (3, 5) patches per (x,y) axis # **** New data shape is: (165, 768, 1024, 1) # EXAMPLE 3 # Merge the data of example 3 of 'crop_data_with_overlap' function X_train, Y_train = merge_data_with_overlap( X_train, (165, 768, 1024, 1), Y_train, (0.5, 0.5), out_dir='out_dir') # The function will print the shape of the created array. In this example: # Minimum overlap selected: (0.5, 0.5) # Real overlapping (%): (0.59765625, 0.5703125) # Real overlapping (pixels): (153.0, 146.0) # (6, 8) patches per (x,y) axis # **** New data shape is: (165, 768, 1024, 1) # EXAMPLE 4 # Merge the data of example 1 of 'crop_data_with_overlap' function X_train, Y_train = merge_data_with_overlap( X_train, (165, 768, 1024, 1), Y_train, (0.5, 0), out_dir='out_dir') # The function will print the shape of the created array. In this example: # Minimum overlap selected: (0.5, 0) # Real overlapping (%): (0.59765625, 0.0) # Real overlapping (pixels): (153.0, 0.0) # (6, 4) patches per (x,y) axis # **** New data shape is: (165, 768, 1024, 1)
As example of different overlap maps are presented below.
- biapy.data.data_2D_manipulation.load_data_classification(data_dir, patch_shape, convert_to_rgb=True, expected_classes=None, cross_val=False, cross_val_nsplits=5, cross_val_fold=1, val_split=0.1, seed=0, shuffle_val=True)[source]
Load data to train classification methods.
- Parameters:
data_dir (str) – Path to the training data.
patch_shape (Tuple of ints) – Shape of the patch. E.g.
(y, x, channels)
.convert_to_rgb (bool, optional) – In case RGB images are expected, e.g. if
crop_shape
channel is 3, those images that are grayscale are converted into RGB.expected_classes (int, optional) – Expected number of classes to be loaded.
cross_val (bool, optional) – Whether to use cross validation or not.
cross_val_nsplits (int, optional) – Number of folds for the cross validation.
cross_val_fold (int, optional) – Number of the fold to be used as validation.
val_split (float, optional) – % of the train data used as validation (value between
0
and1
).seed (int, optional) – Seed value.
shuffle_val (bool, optional) – Take random training examples to create validation data.
- Returns:
X_data (4D Numpy array) – Train images. E.g.
(num_of_images, y, x, channels)
.Y_data (4D Numpy array) – Train images’ mask. E.g.
(num_of_images, y, x, channels)
.X_val (4D Numpy array, optional) – Validation images (
val_split > 0
). E.g.(num_of_images, y, x, channels)
.Y_val (4D Numpy array, optional) – Validation images’ mask (
val_split > 0
). E.g.(num_of_images, y, x, channels)
.all_ids (List of str) – Loaded data filenames.
val_index (List of ints) – Indexes of the samples beloging to the validation.