wiki:Corpora/ThaiWaC

Thai WaC

The corpus is prepared by Corpus factory method described  here. Full details are described in  Kilgarriff et al. at LREC 2010.

Corpus is tokenised using Swath Word Segmentation tool downloadable at  http://www.cs.cmu.edu/~paisarn/software.html