Turkish WaC
The corpus is prepared by Corpus factory method described here. Full details are described in Kilgarriff et al. at LREC 2010.
Changelog
v2.0 (25 Oct 2011)
Word Sketches are compiled using below resources.
The morphological analyzer and morphological disambiguator (POS tagger) are from Kemal Oflazer and Deniz Yüret downloadable at http://www3.itu.edu.tr/~gulsenc/tfeaturesn.rar and http://deniz.yuret.com/turkish/tr-disamb.tgz respectively
Word Sketches are generated from an existing dependency parser. Dependency parser can be downloaded from http://web.itu.edu.tr/gulsenc/TurkishDepModel.html
We would like to thank Gülşen Eryiğit and Kemal Oflazer for answering our emails and providing us the tools.
