Indonesian WaC
The corpus is prepared by Corpus factory method described here. Full details are described in Kilgarriff et al. at LREC 2010.
Changelog
v2.0 (5 May 2010)
fixed tokenisation problems (Standard tokenization program unitok.py is used)
The corpus is prepared by Corpus factory method described here. Full details are described in Kilgarriff et al. at LREC 2010.
fixed tokenisation problems (Standard tokenization program unitok.py is used)