ViT
- class biapy.models.vit.VisionTransformer(ndim=2, global_pool=False, **kwargs)[source]
Bases:
VisionTransformer
Mask autoenconder (MAE) with VisionTransformer (ViT) backbone.
Reference: Masked Autoencoders Are Scalable Vision Learners.
- Parameters:
ndim (int, optional) – Number of input dimensions.
global_pool (bool, optional) – Whether to use global pooling or not.
- Returns:
model – ViT model.
- Return type:
Torch model