deTenTen
German TenTen corpus.
The corpus is double-tagged with RFTagger (attribute tag, tagset reference) and TreeTagger (attribute tt_tag, tagset reference).
Changelog
v2.0 (28 April 2011)
- fixed problems with part-of-speech tagging which caused a major data loss in the previous version
- 2.8 billion tokens
v1.0 (30 November 2010)
- initial version -- 1.2 billion tokens
