HomeModelsImage Feature Extractiongoogle/vit-base-patch16-224-in21k
V

google/vit-base-patch16-224-in21k

Image Feature Extraction·google· 1.8M· 411
transformers apache-2.0 86.4M params dataset:imagenet-21karxiv:2010.11929arxiv:2006.03677license:apache-2.0

image feature extraction · transformers model

Open in MLForge Sign up free Desktop app Source ↗
# pull & run locally
pip install mlforge-sdk && mlforge pull google/vit-base-patch16-224-in21k

Model details

Task
Image Feature Extraction
Provider
google
Framework
transformers
Parameters
86.4M
License
apache-2.0
Downloads
1.8M
Likes
411
Paper
arXiv:2010.11929
Updated
2024-02-05

About google/vit-base-patch16-224-in21k

Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.

Related Image Feature Extraction

D facebook/dinov2-small Image Feature Extraction ·22.1M params 2.9M 67 🤗 HF V timm/vit_small_patch14_reg4_dinov2.lvd142m Image Feature Extraction 1.5M 7 🤗 HF N nomic-ai/nomic-embed-vision-v1.5 Image Feature Extraction ·92.9M params 1.3M 220 🤗 HF D facebook/dinov2-base Image Feature Extraction ·86.6M params 1.3M 181 🤗 HF D facebook/dinov2-large Image Feature Extraction ·304.4M params 970.9K 113 🤗 HF V timm/vit_small_patch14_dinov2.lvd142m Image Feature Extraction 838.2K 6 🤗 HF