Preparing a Corpus for the Sketch Engine: Overview
To prepare a corpus for the Sketch Engine, we must
- Prepare the data, including both
- SkE/PrepareText [the text]
- SkE/PrepareHeaders [the header information] (if any)
- Prepare a SkE/CorpusConfig [corpus configuration file]
- Run the encodevert program.
(Here we assume a running SketchEngine installation.)
This will give us a corpus which can be queried to give a range of concordances and lists. If, in addition, word sketches are required we must also
- Prepare a grammatical relations definitions (gramrels) file: see SkE/CorpusQuerying
- Run the mkws.sh script.
This will also prepare the thesaurus which requires no additional inputs. It takes the word sketch database as input.
