wiki:Corpora/TeluguWaC

TeluguWaC

The corpus is prepared by Corpus factory method described  here. Full details are described in  Kilgarriff et al. at LREC 2010.

Changelog

v2.0 (17th Jan 2012)

The corpus is tagged using a new POS tagger (90.73% accuracy), lemmatizer and morph analyzer downloaded from  http://sivareddy.in/downloads

The tagset details are described in  http://ltrc.iiit.ac.in/tr031/posguidelines.pdf

We wrote a simple sketch grammar for Telugu and generated word sketches and distributional thesaurus for Telugu. If you would like to contribute, please contact us.