UKWaC British English web corpus
The corpus was prepared by Marco Baroni and Adriano Ferraresi. The process is described in Ferraresi 2007: Building a very large corpus of English obtained by Web crawling: ukWaC. MA thesis, University of Bologna, uisng methods as described for German and Italian here.
All material is taken from the .uk domain. It was part-of-speech tagged and lemmatised using TreeTagger, a leading part-of-speech tagger which has been trained for a number of languages.
Grammatical relation definitions as prepared by David Tugwell for other English corpora were used.
