HomeDatasetsbastao/VeraCruz_PT-BR
V

bastao/VeraCruz_PT-BR

Text Generation · bastao· 25.6K
Unknown 567 GB task_categories:text-generationtask_categories:text-classificationlanguage:ptsize_categories:100M<n<1Bformat:parquet

Dataset Summary The VeraCruz Dataset is a comprehensive collection of Portuguese language content, showcasing the linguistic and cultural diversity of of Portuguese-speaking regions. It includes around 190 million samples, organized by regional origin as indicated by URL metadata into primary categories. The primary categories are:

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull bastao/VeraCruz_PT-BR

Dataset details

Task
Text Generation
Language
pt
License
Unknown
Size
567 GB
Rows / images
190.3M
Creator
bastao
Downloads
25.6K
Source
huggingface_datasets
Updated
2025-07-21

About bastao/VeraCruz_PT-BR

Dataset Summary The VeraCruz Dataset is a comprehensive collection of Portuguese language content, showcasing the linguistic and cultural diversity of of Portuguese-speaking regions. It includes around 190 million samples, organized by regional origin as indicated by URL metadata into primary categories. The primary categories are: