Task
General
This dataset includes pre-processed wikipedia data for tokenizer evaluation in 45 languages. We provide more information on the evaluation task in general this blogpost.
This dataset includes pre-processed wikipedia data for tokenizer evaluation in 45 languages. We provide more information on the evaluation task in general this blogpost.