ViT

class biapy.models.vit.VisionTransformer(ndim=2, global_pool=False, **kwargs)[source]

Bases: VisionTransformer

Mask autoenconder (MAE) with VisionTransformer (ViT) backbone.

Reference: Masked Autoencoders Are Scalable Vision Learners.

Parameters:
  • ndim (int, optional) – Number of input dimensions.

  • global_pool (bool, optional) – Whether to use global pooling or not.

Returns:

model – ViT model.

Return type:

Torch model

forward_features(x)[source]
biapy.models.vit.vit_base_patch16(**kwargs)[source]
biapy.models.vit.vit_large_patch16(**kwargs)[source]
biapy.models.vit.vit_huge_patch14(**kwargs)[source]