google/vit-base-patch16-224-in21k

Image Feature Extraction·google· 1.8M· 411

transformers apache-2.0 86.4M params dataset:imagenet-21karxiv:2010.11929arxiv:2006.03677license:apache-2.0

image feature extraction · transformers model

Open in MLForge Sign up free Desktop app Source ↗

# pull & run locally
pip install mlforge-sdk && mlforge pull google/vit-base-patch16-224-in21k

Model details

Task

Image Feature Extraction

Provider

google

Framework

transformers

Parameters

86.4M

License

apache-2.0

Downloads

1.8M

Likes

411

Paper

arXiv:2010.11929

Updated

2024-02-05

About google/vit-base-patch16-224-in21k

Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.

Related Image Feature Extraction

D facebook/dinov2-small Image Feature Extraction ·22.1M params 2.9M 67 🤗 HF V timm/vit_small_patch14_reg4_dinov2.lvd142m Image Feature Extraction 1.5M 7 🤗 HF N nomic-ai/nomic-embed-vision-v1.5 Image Feature Extraction ·92.9M params 1.3M 220 🤗 HF D facebook/dinov2-base Image Feature Extraction ·86.6M params 1.3M 181 🤗 HF D facebook/dinov2-large Image Feature Extraction ·304.4M params 970.9K 113 🤗 HF V timm/vit_small_patch14_dinov2.lvd142m Image Feature Extraction 838.2K 6 🤗 HF