Sketch Engine
  • Login
  • Wiki
  • Timeline
  • View Tickets
  • New Ticket
  • Search
  • Settings

Wiki Navigation

  • Start Page
  • Index by Title
  • Index by Date
  • Last Change

JpWaC Japanese Web Corpus

The corpus was prepared by Tomaž Erjavec using a list of URLs provided by Serge Sharoff at the University of Leeds using the method described here, designed to produce a general language resource. There has been little checking of the content.

It was segmented, part-of-speech tagged and lemmatised using Chasen, an open-source toolset for Japanese.

Word sketches were prepared by Irena Srdanovic.

Download in other formats:

  • Plain Text

Sketch Engine
Bringing Corpora to the Masses

Lexical Computing Ltd

Brought to you by
Lexical Computing Ltd