wiki:Corpora/KannadaWaC

KannadaWaC

The corpus is prepared by Corpus factory method described  here. Full details are described in  Kilgarriff et al. at LREC 2010.

Changelog

v2.0 (17th Jan 2012)

The corpus is tagged using a new POS tagger (77.63% accuracy), lemmatizer and morph analyzer downloaded from  http://sivareddy.in/downloads

The tagset details are described in  http://ltrc.iiit.ac.in/tr031/posguidelines.pdf

We wrote a simple sketch grammar for Kannada and generated word sketches and distributional thesaurus for Kannada. If you would like to contribute, please contact us.

Reference for the corpus and tagger:  http://www.aclweb.org/anthology-new/W/W11/W11-3603.pdf Corpus is collected in 2011.