wiki:SkE/SubcorpusConfig

Creating Subcorpora for Sharing with All Users

In Sketch Engine, subcorpora can be created by users in their own name space, each user has their own subcorpora and cannot access subcorpora of other users.

To share common subcorpora, it is possible to create a list of subcorpora which are accessible by all users (so called "global subcorpora"). The list of global subcorpora is defined in a subcorpus definition file. An example ( subcdef.txt) is attached here with instructions on the format provided at the start of the file.

To compile the shared (global) subcorpora it is possible to use either the CA interface or a mksubc.py script.

1) via Corpus Architect interface

  • Once, you have created your subcorpus definition file, it is necessary to:

- upload the definition

  • go to the home page (corpora overview)
  • start by pressing Subcorpus definitions in the left hand side menu
  • click on Add new subcorpus definition file at the bottom right
  • find and upload the definition file on your computer
  • fill in the name it should be referred to within Sketch Engine and click OK

Note that your uploaded definition files can be shared with other users. This allows the other users to compile subcorpora using your definition file or to view the file itself. This is *not* necessary for sharing the actual subcorpora you have compiled for a given corpus with other users.

- recompile the corpus

  • if you have uploaded a subcorpus definition file to the server or someone has shared their definition with you, open the corpus by clicking on its name (it works only on user corpora - not the preloaded ones)
  • select Set subcorpus definitions in the left hand side menu (if the label is grayed, make sure the corpus is already compiled)
  • choose a definition file you want to use
  • tick the Recompile subcorpora checkbox and click OK
  • if the compilation finishes without any errors then all users that have access to the corpus will also see the newly created subcorpora

2) using mksubc.py script

Usage: mksubc.py CORPNAME SUBCORP_DIR SUBCORP_DEF_FILE

SUBCORP_DIR is a directory where the subcorpora will be created, this depends on the Sketch Engine installation.

Note that mksubc.py is run by compilecorp (see SkE/CompilingCorpus)

Attachments