ItWaC Italian Web Corpus
The corpus was prepared by Marco Baroni in a web crawl as described at EACL 2006 (paper available here).
It was part-of-speech tagged and lemmatised using TreeTagger, an open-source part-of-speech tagger which has been trained for a number of languages.
Word sketches were prepared by Marco Baroni.
