If you are new to SkE, you might need to create several user accounts. To arrange this contact xpomikal at fi dot muni dot com or rychly at gmail dot com.
There are two servers, where you will probably need access. sketchengine.co.uk and corpora.fi.muni.cz, depending on what are you going to do, you will need user account in corpadm group to access the computer and another administrative account to sketch engine. In addition, if you are reading this page as a guest, you will also need an account to trac wiki at trac.sketchengine.co.uk. Hence up to five different accounts.
If you managed to obtain all your accounts, you can start creating your corpus.
Create following pages at http://trac.sketchengine.co.uk/wiki/Corpora/<YourCorpusName?> and http://trac.sketchengine.co.uk/wiki/Private/Corpora/<YourCorpusName?>. the first one should contain information about corpus that a regular user wants to know (what is it about, how big is it, whom to contact...), the second should contain information that will help to someone, who will continue in your work on your corpus (where are the sources, your scripts, etc.).
then login to SkE server: - ssh <username>@sketchengine.co.uk or - ssh <username>@corpora.fi.muni.cz
Make yourself familiar with directory structure on SkE server:
- /corpora/registry/ - registry fles
- /corpora/manatee/ - compiled corpuses (binaries)
- /corpora/vert/ - vertical (analogy of /nlp/corpora/priprava_dat/)
- /corpora/wsdef/ - files with WS definitions
- /var/ske/registry/preloaded/ - registry files, which the administration system proccesses. (mostly symb. links to /corpora/registry/)
- /var/ske/registry/preloaded/default/ - default corpuses (those, which are displayed as 'default' in admin. system), if you put here something, always do a symlink to /var/ske/registry/preloaded/
(this one is valid for sketchengine.co.uk, the corpora.fi.muni.cz tree is similar, but it mostly starts with /nlp/corpora)
Before creating corpora study [SkE/PreparingCorpusOverview].
In addition to regular lines, your config file of your corpus should contain:
MINOR "1"
- the corpus isn't displayed to user in list until clicking on "more corpora"
LANGUAGE "language"
- language of corpus
INFOHREF "url"
- link to informations about corpus; should be external or to the wiki on http://trac.sketchengine.co.uk/wiki/Corpora/CorpusName
Useful scripts
genws.sh CORPUS WSDEF_FILE
- generating WS
thes.sh CORPUS
- generating thesaurus (first needs genws)
install_corpus_for_ws.sh USER CORPUS
- Enable to USER to create WS definitions for a corpus CORPUS in CorpusBuilder?.
To make your corpus accessible from sketchengine you need to add it to your user account in admin system of sketchengine: http://www.sketchengine.co.uk/admin/
