wiki:SkE/PreparingCorpusOverview

Preparing a Corpus for the Sketch Engine: Overview

To prepare a corpus for the Sketch Engine, we must

  • Prepare the data, including both
  • Prepare a SkE/CorpusConfig [corpus configuration file]
  • Prepare a SkE/SubcorpusConfig [subcorpus configuration file] This step is needed if you wish to compile subcorpora which can be shared by multiple users
  • Prepare a grammatical relations definitions (gramrels) file: see SkE/CorpusQuerying This step is needed if you require word sketches or a thesaurus (the thesaurus takes the word sketch database as input).
  • Compile the corpus (see compiling corpora)