wiki:Corpora/TurkishWaC

Turkish WaC

The corpus is prepared by Corpus factory method described  here. Full details are described in  Kilgarriff et al. at LREC 2010.

Changelog

v2.0 (25 Oct 2011)

Word Sketches are compiled using below resources.

The morphological analyzer and morphological disambiguator (POS tagger) are from Kemal Oflazer and Deniz Yüret downloadable at  http://www3.itu.edu.tr/~gulsenc/tfeaturesn.rar and  http://deniz.yuret.com/turkish/tr-disamb.tgz respectively

Word Sketches are generated from an existing dependency parser. Dependency parser can be downloaded from  http://web.itu.edu.tr/gulsenc/TurkishDepModel.html

We would like to thank Gülşen Eryiğit and Kemal Oflazer for answering our emails and providing us the tools.