Last modified 6 weeks ago
Word List
Left hand side options:
- select All words to generate a list of words in the corpus ranked by frequency
- select All lemmas to generate a list of lemmas in the corpus ranked by frequency. Lemma is the base (stem) form of a word.
In the main panel of the interface you have further options:
- Subcorpus: where you can specify a subcorpus for the source data, or create a new one.
- Search Attribute: you can specify word, lemma, tag (part of speech tag) etc.. depending on the attributes defined for the corpus or you can specify one of the text types defined for the corpus. The default attribute is word.
Filter wordlist You can either do this for all words (or lemmas or whichever attribute you specify) or you can filter the list using:
- RE pattern: (regular expression pattern) .* is the wild card so "ca.*" would generate a list for all items, words by default, starting with "ca". The search attribute field will determine what the pattern relates to. The default attribute is word, but you could select lemma, tag (part of speech tag) or lc (lower case).
- Minimum Frequency: minimum frequency in that corpus or subcorpus
- Whitelist: upload a list of words (items) that should be included in this list. This is handy in case you have a list of words and want to find out their frequencies in a particular corpus. If you upload a file with such words as a whitelist, you get the word list with frequencies just for your words.
- Blacklist: upload a list of words (items) that should be excluded from this list
- checkbox to include non-words, punctuation etc... (based on a regular expression in the configuration file)
Frequency figures
- You can alter the measure used to rank, and provided with, your word list. These can be word counts, document counts or ARF (average reduced frequency, see SkE/Help/JargonBuster for details on the ARF statistic)
Output type
- Keywords: use this option to generate keywords obtained when contrasting the corpus or subcorpus with a reference corpus or subcorpus.
- You can use the multilevel option to produce lists with more than one attribute (see also Using MultiLevel lists)
