google/vit-base-patch16-224

Image Classification·google· 5.7M· 980

transformers apache-2.0 86.6M params dataset:imagenet-1kdataset:imagenet-21karxiv:2010.11929arxiv:2006.03677

image classification · transformers model

Open in MLForge Sign up free Desktop app Source ↗

# pull & run locally
pip install mlforge-sdk && mlforge pull google/vit-base-patch16-224

Model details

Task

Image Classification

Provider

google

Framework

transformers

Parameters

86.6M

License

apache-2.0

Downloads

5.7M

Likes

980

Paper

arXiv:2010.11929

Updated

2023-09-05

About google/vit-base-patch16-224

Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.

Related Image Classification

M timm/mobilenetv3_small_100.lamb_in1k Image Classification ·2.6M params 22.7M 80 🤗 HF N Falconsai/nsfw_image_detection Image Classification ·85.8M params 9.8M 1.1K 🤗 HF F dima806/fairface_age_image_detection Image Classification ·85.8M params 6.6M 75 🤗 HF R timm/resnet50.a1_in1k Image Classification 2.6M 43 🤗 HF M apple/mobilevit-small Image Classification 2.4M 91 🤗 HF R timm/resnet18.a1_in1k Image Classification 1.8M 14 🤗 HF