wiki:HOO

Helping Our Own (HOO)

On the one hand, identifying grammatical and other linguistic errors in a text is a major challenge for language technology.

On the other, the majority of authors of computational linguistics papers are not native speakers of English, and for many of these -and also some of the native speakers- writing publication-quality papers, in English, following the appropriate conventions for the field, is very hard, and failures to do so may block good researchers from getting their work published.

In this project we aim to bring the research problem and the practical problem together, by defining a task that is 'error correction for draft computational linguistics papers' and encouraging researchers - who may well be the very researchers who hope to benefit from successful solutions - to participate in a shared task.

We follow the usual shared-task methodology: we define the task in some detail and prepare datasets of manually-corrected draft CL papers, one set for participants to use for developing their algorithms, and another set to be used for evaluation. We announce a schedule and encourage participation. Participants are then given a short period to download the evaluation dataset, process it with their tools, and return the output, which is then scored against the manual 'gold standard'. Finally we hold a workshop to present findings, compare methods, and plan the way forward.

We think that the  ACL Anthology Reference Corpus may be a particularly useful resource, as it embodies the target text type (though not without errors). (It is available in a user-friendly corpus interface  here). We also think that this task - domain-and-register-specific error correction - may contrast in interesting ways with 'vanilla', general purpose error correction. We shall take Microsoft Word's grammar checker as a reference system.

HOO has been supported by  the Generation Challenges project and will culminate at  ENLG 2011.

Organisers

 Robert Dale and  Adam Kilgarriff

Attachments