TeluguWaC
The corpus is prepared by Corpus factory method described here. Full details are described in Kilgarriff et al. at LREC 2010.
Changelog
v2.0 (17th Jan 2012)
The corpus is tagged using a new POS tagger (90.73% accuracy), lemmatizer and morph analyzer downloaded from http://sivareddy.in/downloads
The tagset details are described in http://ltrc.iiit.ac.in/tr031/posguidelines.pdf
We wrote a simple sketch grammar for Telugu and generated word sketches and distributional thesaurus for Telugu. If you would like to contribute, please contact us.
