Home › Datasets

Explore AI Datasets

621

Browse 621 datasets on MLForge — unified AI/ML discovery across HuggingFace, PyTorch, OpenML & Roboflow. Download & run locally, free.

General 264

D huggingface/documentation-images General 2.8M 159 🤗 HF U xlangai/ubuntu_osworld_file_cache General 1.4M 32 🤗 HF B banned-historical-archives/banned-historical-archives General 1.3M 47 🤗 HF O osv5m/osv5m General 1.2M 54 🤗 HF O ryanmarten/OpenThoughts-1k-sample General 1.2M 28 🤗 HF R mteb/results General ·1.6 GB 919.0K 4 🤗 HF D hf-doc-build/doc-build-dev General 865.1K 37 🤗 HF D mlfoundations/dclm-baseline-1.0 General 551.2K 288 🤗 HF L mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M General ·43 TB 481.1K 72 🤗 HF S SWE-bench/SWE-bench_Multilingual General ·1.1 MB 480.7K 18 🤗 HF A just-me7ss/American-Sign-Language-Dataset General ·49 GB 457.9K 2 🤗 HF J jat-project/jat-dataset-tokenized General ·829 GB 432.0K 15 🤗 HF S PsiBotAI/SynData General ·27 TB 363.6K 182 🤗 HF C Symato/cc General ·88 GB 288.6K 3 🤗 HF G siril-spcc/gaia General ·21 GB 285.0K 12 🤗 HF T HuggingFaceM4/the_cauldron General ·282 GB 251.7K 546 🤗 HF H Rowan/hellaswag General ·35 MB 239.9K 182 🤗 HF T Kazimir-ai/text-to-image-prompts General ·4.8 MB 227.9K 9 🤗 HF V shenyunhang/VoiceAssistant-400K General ·923 GB 220.5K 3 🤗 HF A ZahidYasinMittha/American-Sign-Language-Dataset General ·49 GB 211.4K 4 🤗 HF P applied-ai-018/pretraining_v1-omega_books General ·11 TB 211.1K 6 🤗 HF G stanford-vision-lab/gpic General ·12 TB 207.0K 141 🤗 HF W allenai/winogrande General ·4.2 MB 203.7K 83 🤗 HF P nvidia/PhysicalAI-Autonomous-Vehicles General ·224 TB 199.2K 923 🤗 HF D hf-doc-build/doc-build General ·532 GB 192.7K 39 🤗 HF E EssentialAI/essential-web-v1.0 General ·69 TB 192.1K 227 🤗 HF S anon8231489123/ShareGPT_Vicuna_unfiltered General ·5.8 GB 187.6K 882 🤗 HF O allenai/objaverse General ·8.1 TB 178.9K 453 🤗 HF C jeyasuryaur/cricket-data-by-cricsheet General 162.1K 🤗 HF P nvidia/PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios General ·7.5 TB 158.9K 14 🤗 HF A bluuebunny/arxiv_metadata_by_year General ·5.4 GB 152.0K 9 🤗 HF D DL3DV/DL3DV-Benchmark General ·1.2 TB 150.6K 36 🤗 HF E pwc-archive/evaluation-tables General ·131 MB 146.6K 🤗 HF B huggingface/badges General ·680 KB 145.9K 48 🤗 HF Y defeatbeta/yahoo-finance-data General ·123 GB 141.5K 98 🤗 HF W japanese-asr/whisper_transcriptions.reazon_speech_all General ·2.3 TB 140.5K 14 🤗 HF R Gourieff/ReActor General ·8.1 GB 137.3K 294 🤗 HF L hf-internal-testing/librispeech_asr_dummy General ·139 MB 127.9K 10 🤗 HF Y espnet/yodas General ·33 TB 127.3K 141 🤗 HF F HuggingFaceM4/FineVision General ·9.1 TB 126.6K 498 🤗 HF P SwayStar123/preprocessed_commoncatalog-cc-by General ·255 GB 124.3K 2 🤗 HF M artur-muratov/multilingual-speech-commands-15lang General ·11 GB 122.0K 2 🤗 HF V SimulaMet-HOST/visem-tracking-graphs General 120.8K 🤗 HF H EleutherAI/hendrycks_math General ·35 MB 119.2K 106 🤗 HF P applied-ai-018/pretraining_v1-omega General ·8.4 TB 110.6K 🤗 HF A akasheroor/American-Sign-Language-Dataset General ·49 GB 108.9K 4 🤗 HF A McAuley-Lab/Amazon-Reviews-2023 General ·1.4 TB 107.6K 313 🤗 HF S JustinZekai/SlidesBench General ·29 GB 103.2K 🤗 HF

Text Generation 74

W Salesforce/wikitext Text Generation ·4.7 GB 1.3M 724 🤗 HF G openai/gsm8k Text Generation ·4.8 MB 880.9K 1.4K 🤗 HF C allenai/c4 Text Generation ·488 KB 827.1K 601 🤗 HF F HuggingFaceFW/finephrase Text Generation ·6.9 TB 472.2K 129 🤗 HF F HuggingFaceFW/fineweb-edu Text Generation ·5.6 TB 399.4K 1.2K 🤗 HF A permutans/arxiv-papers-by-subject Text Generation ·1.6 GB 382.5K 23 🤗 HF F HuggingFaceFW/fineweb Text Generation ·107 TB 318.9K 2.9K 🤗 HF O isaacus/open-australian-legal-corpus Text Generation ·170 GB 214.4K 92 🤗 HF Z Zyphra/Zyda-2 Text Generation ·13 TB 174.7K 98 🤗 HF Z jobs-git/Zyda-2 Text Generation ·1.5 TB 160.7K 1 🤗 HF W wikimedia/wikipedia Text Generation ·1.3 TB 154.9K 1.3K 🤗 HF M liwu/MNBVC Text Generation ·3.6 TB 139.2K 633 🤗 HF F airtrain-ai/fineweb-edu-fortified Text Generation ·1.6 TB 136.5K 65 🤗 HF F anisoleai/fineweb-tokenized Text Generation ·7.5 TB 131.5K 2 🤗 HF M HuggingFaceH4/MATH-500 Text Generation ·205 KB 126.9K 316 🤗 HF A OpenSQZ/AutoMathText-V2 Text Generation ·7.1 TB 122.2K 78 🤗 HF W legacy-datasets/wikipedia Text Generation ·44 GB 121.6K 645 🤗 HF A AlgorithmicResearchGroup/arxiv_s2orc_parsed Text Generation ·41 GB 110.8K 27 🤗 HF F epfml/FineWeb-HQ Text Generation ·14 TB 108.4K 9 🤗 HF F HuggingFaceFW/fineweb-2 Text Generation ·45 TB 95.3K 827 🤗 HF O open-index/open-github Text Generation ·50 GB 94.0K 9 🤗 HF U openbmb/Ultra-FineWeb-L3 Text Generation ·1.7 TB 93.3K 302 🤗 HF E jhu-clsp/ettin-pretraining-data Text Generation ·2.4 TB 88.3K 9 🤗 HF I google/IFEval Text Generation ·2.9 MB 88.1K 152 🤗 HF T roneneldan/TinyStories Text Generation ·16 GB 80.3K 1.0K 🤗 HF F HuggingFaceFW/finepdfs Text Generation ·12 TB 80.1K 882 🤗 HF U openbmb/Ultra-FineWeb Text Generation ·8.9 TB 79.4K 392 🤗 HF D allenai/dolma3_mix-6T Text Generation ·4.0 TB 78.6K 32 🤗 HF A tatsu-lab/alpaca Text Generation ·304 MB 74.7K 998 🤗 HF G theelderemo/genius-lyrics-cleaned Text Generation ·2.4 GB 73.5K 12 🤗 HF O Skylion007/openwebtext Text Generation ·80 GB 72.0K 522 🤗 HF D allenai/dolma3_mix-6T-1025-7B Text Generation ·8.9 TB 70.7K 53 🤗 HF W allenai/WildChat-4.8M Text Generation ·14 GB 65.2K 164 🤗 HF W Rikunarita-ORG/Wikinium Text Generation ·3.7 GB 64.2K 7 🤗 HF N nvidia/Nemotron-CC-Math-v1 Text Generation ·242 GB 63.1K 88 🤗 HF O allenai/olmo-mix-1124 Text Generation ·6.8 TB 61.0K 88 🤗 HF D muset-ai/DeepResearch-Bench-II-Dataset Text Generation ·938 MB 59.5K 2 🤗 HF U HuggingFaceH4/ultrachat_200k Text Generation ·47 GB 57.5K 736 🤗 HF D llamafactory/demo_data Text Generation ·13 MB 51.0K 1 🤗 HF F HuggingFaceFW/fineweb-edu-score-2 Text Generation ·18 TB 48.4K 87 🤗 HF D X779/Danbooruwildcards Text Generation ·16 GB 46.0K 13 🤗 HF S anoynsharechat/sharechat Text Generation ·5.0 GB 45.7K 🤗 HF U openbmb/UltraData-SFT-2605 Text Generation ·335 GB 42.0K 355 🤗 HF M allenai/MADLAD-400 Text Generation ·35 TB 41.6K 170 🤗 HF M ise-uiuc/Magicoder-OSS-Instruct-75K Text Generation ·578 MB 39.1K 167 🤗 HF A librarian-bots/arxiv-metadata-snapshot Text Generation ·283 GB 38.6K 19 🤗 HF D allenai/dolma3_pool Text Generation ·16 TB 37.8K 36 🤗 HF W wannaphong/wikipedia-monthly Text Generation ·748 GB 37.4K 🤗 HF

Robotics 41

1 genrobot2025/10Kh-RealOmin-OpenData Robotics 960.8K 219 🤗 HF B IPEC-COMMUNITY/bridge_orig_lerobot Robotics 763.7K 23 🤗 HF P nvidia/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim Robotics ·1.7 TB 538.0K 234 🤗 HF D IPEC-COMMUNITY/droid_lerobot Robotics ·365 GB 536.8K 31 🤗 HF L IPEC-COMMUNITY/language_table_lerobot Robotics ·71 GB 333.7K 1 🤗 HF 1 ad1t7a/10Kh-RealOmin-OpenData Robotics ·3.1 TB 332.3K 🤗 HF D cadene/droid Robotics ·373 GB 295.5K 16 🤗 HF D cadene/droid_1.0.1 Robotics ·372 GB 285.5K 44 🤗 HF F IPEC-COMMUNITY/fractal20220817_data_lerobot Robotics ·20 GB 205.2K 12 🤗 HF H tencent/Hy-Embodied-0.5-VLA-Data Robotics ·19 TB 105.2K 12 🤗 HF 2 Whoisjutanlee/2.1tbofdata Robotics ·2.5 TB 103.6K 🤗 HF O rad1d1m123/OmniAction Robotics ·2.5 TB 97.8K 🤗 HF A XDOF/ABC-130k Robotics ·31 TB 87.1K 50 🤗 HF O hosam12kalad/OmniAction Robotics ·2.4 TB 85.8K 🤗 HF P sini-21/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim Robotics ·1.7 TB 76.0K 🤗 HF K IPEC-COMMUNITY/kuka_lerobot Robotics ·33 GB 74.9K 🤗 HF P Rooftech650/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim Robotics ·1.7 TB 69.4K 🤗 HF P nvidia/PhysicalAI-Robotics-Open-H-Embodiment Robotics ·4.0 TB 65.5K 38 🤗 HF R Hoshipu/real_robot_data Robotics ·183 GB 63.2K 🤗 HF S fpvlabs/stera-10m Robotics ·1.6 TB 61.5K 81 🤗 HF B Saberlve/bridgev2 Robotics 60.4K 🤗 HF I InternRobotics/InternData-N1 Robotics ·14 TB 60.1K 78 🤗 HF 2 behavior-1k/2025-challenge-demos Robotics ·1.9 TB 60.0K 36 🤗 HF A cadene/agibot_alpha_v30 Robotics ·1.7 TB 50.5K 🤗 HF L yaak-ai/L2D Robotics ·5.1 TB 50.5K 48 🤗 HF O OpenMOSS-Team/OmniAction Robotics ·2.5 TB 48.6K 283 🤗 HF K balatubs123/kumagong Robotics ·330 GB 46.0K 🤗 HF B arekborucki/bridge_orig_lerobot Robotics ·20 GB 44.5K 🤗 HF A agibot-world/AgiBotWorld2026 Robotics ·9.0 TB 36.2K 46 🤗 HF L physical-intelligence/libero Robotics ·33 GB 34.4K 85 🤗 HF B ygtxr1997/bridge_orig_lerobot Robotics ·20 GB 32.4K 🤗 HF X Facebear/XVLA-Soft-Fold Robotics ·443 GB 29.4K 12 🤗 HF W tars-robotics/WIYH Robotics ·34 TB 28.4K 🤗 HF P BeingBeyond/PND_Adam-U_pick-simple Robotics ·9.9 GB 28.1K 4 🤗 HF H USC-PSI-Lab/humanoid-everyday Robotics ·2.4 TB 28.0K 40 🤗 HF L HuggingFaceVLA/libero Robotics ·65 GB 27.1K 64 🤗 HF D sriramsk/droid_lerobot Robotics ·215 GB 23.9K 🤗 HF R RoboVerseOrg/roboverse_data Robotics ·42 GB 19.8K 23 🤗 HF P lerobot/pusht Robotics ·177 MB 19.4K 51 🤗 HF M chenhn02/MetaFold Robotics ·24 GB 17.6K 🤗 HF A lerobot/aloha_sim_transfer_cube_human Robotics ·2.9 GB 15.2K 14 🤗 HF

Question Answering 31

M cais/mmlu Question Answering ·89 GB 454.6K 775 🤗 HF A allenai/ai2_arc Question Answering ·90 MB 434.6K 358 🤗 HF M TIGER-Lab/MMLU-Pro Question Answering ·128 MB 161.7K 488 🤗 HF O allenai/openbookqa Question Answering ·87 MB 153.6K 133 🤗 HF S rajpurkar/squad Question Answering ·16 MB 138.1K 368 🤗 HF M Williamsanderson/MedQA-Darija-MultiLingual Question Answering ·107 GB 112.3K 4 🤗 HF G Idavidrein/gpqa Question Answering ·6.8 MB 103.7K 467 🤗 HF T locuslab/TOFU Question Answering ·6.7 MB 86.7K 55 🤗 HF S allenai/sciq Question Answering ·91 MB 84.6K 143 🤗 HF O nvidia/OpenMathInstruct-2 Question Answering ·81 GB 81.5K 245 🤗 HF H hotpotqa/hotpot_qa Question Answering ·396 MB 80.5K 306 🤗 HF G RUC-NLPIR/GISA Question Answering ·488 KB 66.4K 3 🤗 HF M MMMU/MMMU Question Answering ·9.0 GB 66.1K 329 🤗 HF L zai-org/LongBench Question Answering ·590 MB 58.9K 184 🤗 HF B Tevatron/browsecomp-plus Question Answering ·11 GB 57.6K 35 🤗 HF M hails/mmlu_no_train Question Answering ·171 MB 55.4K 29 🤗 HF C lhoestq/custom_squad Question Answering ·448 MB 51.3K 🤗 HF P ybisk/piqa Question Answering ·116 MB 50.9K 105 🤗 HF T mandarjoshi/trivia_qa Question Answering ·77 GB 49.4K 194 🤗 HF C tau/commonsense_qa Question Answering ·60 MB 46.8K 152 🤗 HF L coastalcph/lex_glue Question Answering ·7.0 GB 44.3K 77 🤗 HF D databricks/databricks-dolly-15k Question Answering ·254 MB 38.1K 986 🤗 HF J farhanhubble/jfk-archives Question Answering ·55 GB 37.2K 🤗 HF M openlifescienceai/medmcqa Question Answering ·1.6 GB 36.2K 229 🤗 HF P fka/prompts.chat Question Answering ·28 MB 32.2K 9.8K 🤗 HF S rajpurkar/squad_v2 Question Answering ·17 MB 29.8K 255 🤗 HF P qiaojin/PubMedQA Question Answering ·4.9 GB 27.7K 328 🤗 HF R corbyrosset/researchy_questions Question Answering ·1.7 GB 25.1K 37 🤗 HF A AssistantBench/AssistantBench Question Answering ·60 KB 22.1K 23 🤗 HF X Salesforce/xlam-function-calling-60k Question Answering ·451 MB 20.3K 642 🤗 HF D AmazonScience/document-haystack Question Answering ·10 GB 19.4K 20 🤗 HF

Image To Text 21

Text Classification 20

Image Classification 16

Automatic Speech Recognition 16

Image Segmentation 12

Other 11

Video Classification 11

Visual Question Answering 10

Time Series Forecasting 8

Image Text To Text 6

Text2text Generation 6

Text To Image 6

Image To Image 6

Fill Mask 5

Text To Video 5

Text To Speech 4

Depth Estimation 4

Object Detection 4

Token Classification 3

Summarization 3

Image To 3d 3

Audio Classification 3

Text To 3d 3

Feature Extraction 3

Translation 3

Tabular Classification 3

Reinforcement Learning 2

Video Text To Text 2

Audio To Audio 2

Multiple Choice 2

Sentence Similarity 2

Keypoint Detection 1

Image To Video 1

Table Question Answering 1

Zero Shot Classification 1

Text Retrieval 1

Conversational 1