HomeDatasetsJQL-AI/hplt2_embeddings
H

JQL-AI/hplt2_embeddings

Feature Extraction · JQL-AI· 10.8K
Unknown 8.2 TB task_categories:feature-extractionlanguage:sqlanguage:bglanguage:calanguage:cs

HPLT2-embeddings is an extension of the HPLT2 dataset, annotated with document-level Snowflake's Arctic-embed-m-v2.0 embeddings for 35 languages, making the dataset useful for a variety of tasks, including document clustering, filtering, and other multilingual research.

Open in MLForge Sign up free Desktop app
# download instantly
mlforge datasets pull JQL-AI/hplt2_embeddings

Dataset details

Task
Feature Extraction
Language
sq
License
Unknown
Size
8.2 TB
Creator
JQL-AI
Downloads
10.8K
Source
huggingface_datasets
Updated
2025-08-21

About JQL-AI/hplt2_embeddings

HPLT2-embeddings is an extension of the HPLT2 dataset, annotated with document-level Snowflake's Arctic-embed-m-v2.0 embeddings for 35 languages, making the dataset useful for a variety of tasks, including document clustering, filtering, and other multilingual research.