wiki:Corpora/IndonesianWaC

Indonesian WaC

The corpus is prepared by Corpus factory method described  here. Full details are described in  Kilgarriff et al. at LREC 2010.

Changelog

v2.0 (5 May 2010)

fixed tokenisation problems (Standard tokenization program unitok.py is used)