KannadaWaC
The corpus is prepared by Corpus factory method described here. Full details are described in Kilgarriff et al. at LREC 2010.
Changelog
v2.0 (17th Jan 2012)
The corpus is tagged using a new POS tagger (77.63% accuracy), lemmatizer and morph analyzer downloaded from http://sivareddy.in/downloads
The tagset details are described in http://ltrc.iiit.ac.in/tr031/posguidelines.pdf
We wrote a simple sketch grammar for Kannada and generated word sketches and distributional thesaurus for Kannada. If you would like to contribute, please contact us.
Reference for the corpus and tagger: http://www.aclweb.org/anthology-new/W/W11/W11-3603.pdf Corpus is collected in 2011.
