To make use of dynamic attributes they have to be set up in corpus config file .
DYNAMIC feature requires to be set to a of some internal function or to a name of function from external shared library.
DYNLIB has to be set to "internal" or to the name of shared library accordingly.
Internal functions
- striplastn (str,n) - returns str striped from last n characters - lowercase (str, locale) - returns str in lowercase - getfirstn (str, n) - returns first n characters of str - getnchar (str, n) - returns n-th character of str - getnextchars (str,attr,n) - returns n characters after attr(character) - getnextchar (str, attr) - returns the character after attr(character)
ATTRIBUTE lemma {
DYNAMIC striplastn
DYNLIB internal
ARG1 "2"
FUNTYPE i
FROMATTR lempos
TYPE index
}
ATTRIBUTE lc {
DYNAMIC lowercase
DYNLIB internal
ARG1 "C"
FUNTYPE s
FROMATTR word
TYPE index
TRANSQUERY yes
}
ATTRIBUTE tag {
DYNAMIC getfirstn
DYNLIB internal
ARG1 "3"
FUNTYPE i
FROMATTR ambtag
TYPE index
}
ATTRIBUTE k {
DYNAMIC getnchar
DYNLIB internal
ARG1 1
FUNTYPE i
FROMATTR tag
TYPE index
}
ATTRIBUTE g {
DYNAMIC getnextchar
DYNLIB internal
ARG1 "g"
FUNTYPE c
FROMATTR tag
TYPE index
}
ATTRIBUTE g3 {
DYNAMIC getnextchar
DYNLIB internal
ARG1 "g"
ARG2 3
FUNTYPE ci
FROMATTR tag
TYPE index
}
Shared library
Following example function takes the year of publishing of the document and determins the epoch from which the document comes.
- the source code (epoch.c):
#include <stdio.h>
const char * epoch (char* year)
{
int y;
sscanf(year, "%d",&y);
if(y<1990) return ("before 1990");
if(y<2001) return ("1990-2000");
if(y<2005) return ("2001-2004");
if(y<2009) return ("2005-2008");
return ("2009 and later");
}
- to compile the library use:
gcc -shared -o epoch.so epoch.c
- the important part from the corpus configuration file:
STRUCTURE doc {
ATTRIBUTE year
ATTRIBUTE time {
DYNAMIC epoch
DYNLIB "/corpora/vert/greek/epoch.so"
FUNTYPE 0
FROMATTR year
TYPE index
TRANSQUERY yes
}
}
The dynamic attribute will not be created when compiling the corpus using encodevert, it is necessary to create it additionaly using mkdynattr:
mkdynattr <corpus> <dynattr> mkdynattr gkwac0.5 doc.time
