Histograms help
If a noun is usually in the plural, or a verb is usually in the passive, this is a salient fact about the word (possibly worth mentioning in a dictionary). We include these facts in the word sketch in the form of a histogram. 75% of the occurrences of the English noun constraint in the BNC are in the plural. How salient is this fact? To answer, we need to know whether it is unusual or typical behaviour for an English noun. We need to assess the fact about constraint against the background of "other nouns".
We do this by
- calculating the percentage for all nouns,
- putting them in rank order, and
- seeing whether the keyword (e. g., constraint) is an exceptional case.
If it is, we add a histogram to the word sketch. To build the histogram we
- count the nouns with between 0 and 10%, 10 and 20% ... plurals,
- plot them in a figure with columns for 0-10% ... 90-100%
- colour the column which the keyword falls in, red.
Default value for "what counts as noteworthy" (so that we add a histogram to the word sketch): in the highest or lowest 10% of the population.
By moving the mouse over the histogram, you can see the percentage of times the word is in the plural/passive/... and also how 'unusual' this is, eg where the word comes in the rank order (expressed as a percentage).
The approach is currently being tested: parameters will probably change. We currently produce histograms (for the BNC only) for:
- nouns: % in plural
- verbs: % in passive and % in present-participle form
