wiki:Corpora/esTenTen

esTenTen

Spanish TenTen corpus.

The corpus is tagged with  TreeTagger using the  Spanish parameter file (UTF-8).

Changelog

v2.0 (30 September 2011)

  • removed Catalan and Galician texts
  • corpus size reduced by 79 million tokens

v1.0 (13 April 2011)

  • initial version -- 2.5 billion tokens