Task
General
What is Symato CC? To download all WARC data from Common Crawl then filter out Vietnamese in Markdown and Plaintext format. There is 1% of Vietnamse in CC, extract all of them out should be a lot (~10TB of plaintext).
What is Symato CC? To download all WARC data from Common Crawl then filter out Vietnamese in Markdown and Plaintext format. There is 1% of Vietnamse in CC, extract all of them out should be a lot (~10TB of plaintext).