wiki:Corpora/ItWaC

ItWaC Italian Web Corpus

The corpus was prepared by Marco Baroni in a web crawl as described at EACL 2006 (paper available  here).

It was part-of-speech tagged and lemmatised using  TreeTagger, an open-source part-of-speech tagger which has been trained for a number of languages.

Word sketches were prepared by Marco Baroni and later updated by Valentina Efrati and Francesca Masini ( TRIPLE lab, Roma Tre University).