Sketch Engine
  • Login
  • Wiki
  • Timeline
  • View Tickets
  • New Ticket
  • Search
  • Settings

Wiki Navigation

  • Start Page
  • Index by Title
  • Index by Date
  • Last Change

UKWaC British English web corpus

The corpus was prepared by Marco Baroni and Adriano Ferraresi. The process is described in Ferraresi 2007: Building a very large corpus of English obtained by Web crawling: ukWaC. MA thesis, University of Bologna, uisng methods as described for German and Italian here.

All material is taken from the .uk domain. It was part-of-speech tagged and lemmatised using TreeTagger, a leading part-of-speech tagger which has been trained for a number of languages.

Grammatical relation definitions as prepared by David Tugwell for other English corpora were used.

Download in other formats:

  • Plain Text

Sketch Engine
Bringing Corpora to the Masses

Lexical Computing Ltd

Brought to you by
Lexical Computing Ltd