wiki:Corpora/TenTen

TenTen corpora

TenTen is a new generation of Web corpora. These corpora are created by Web crawling and processed with our latest boilerplate cleaning and de-duplication tools. The "TenTen" designates the target sizes of the corpora which is 1010 (10 billion) words.

Available corpora:

New available corpora: