HomeModelsImage Classificationgoogle/vit-base-patch16-224
V

google/vit-base-patch16-224

Image Classification·google· 5.7M· 980
transformers apache-2.0 86.6M params dataset:imagenet-1kdataset:imagenet-21karxiv:2010.11929arxiv:2006.03677

image classification · transformers model

Open in MLForge Sign up free Desktop app Source ↗
# pull & run locally
pip install mlforge-sdk && mlforge pull google/vit-base-patch16-224

Model details

Task
Image Classification
Provider
google
Framework
transformers
Parameters
86.6M
License
apache-2.0
Downloads
5.7M
Likes
980
Paper
arXiv:2010.11929
Updated
2023-09-05

About google/vit-base-patch16-224

Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.

Related Image Classification

M timm/mobilenetv3_small_100.lamb_in1k Image Classification ·2.6M params 22.7M 80 🤗 HF N Falconsai/nsfw_image_detection Image Classification ·85.8M params 9.8M 1.1K 🤗 HF F dima806/fairface_age_image_detection Image Classification ·85.8M params 6.6M 75 🤗 HF R timm/resnet50.a1_in1k Image Classification 2.6M 43 🤗 HF M apple/mobilevit-small Image Classification 2.4M 91 🤗 HF R timm/resnet18.a1_in1k Image Classification 1.8M 14 🤗 HF