HindiWaC
The corpus is prepared by Corpus factory method described here. Full details are described in Kilgarriff et al. at LREC 2010.
Changelog
v2.0 (6th Jan 2012)
The corpus is tagged using POS tagger downloaded from http://ltrc.iiit.ac.in/showfile.php?filename=downloads/shallow_parser.php.
The tagset details are described in http://ltrc.iiit.ac.in/tr031/posguidelines.pdf
We wrote a simple sketch grammar for Hindi and generated first word sketches for Hindi. If you would like to contribute, please contact us.
v3.0 (17th Jan 2012)
We recollected Hindi Web Corpus in 2011. The corpus is of size (size to be added)
The corpus is tagged using a new POS tagger (91.31% accuracy), lemmatizer and morph analyzer downloaded from http://sivareddy.in/downloads
The tagset details are described in http://ltrc.iiit.ac.in/tr031/posguidelines.pdf
Sketch Grammar is revised with a new rules which make use of post-position markers (which are crucial in Hindi dependency parsing. More rules to be added. We invite collaborations from the interested parties.)
