wiki:Focloir/en

New Corpus for Ireland – user’s guide

Welcome to the New Corpus for Ireland, a corpus created as part of the New English-Irish Dictionary project in Foras na Gaeilge.

The New Corpus for Ireland is a large collection of texts in Irish with approximately 30 million words. It contains a wide range of texts including works of fiction, factual texts, news reports, official documents and much more. The corpus is designed to be used for linguistic research – for example, to find examples of words being used in context or to find out about word frequency.

This website enables you to consult the corpus in several different ways. The website is based on a corpus query system called Sketch Engine created by Lexical Computing Limited. This document will give you a basic introduction to the website and how to use it. A more detailed guide is available in Sketch Engine’s own help section: https://trac.sketchengine.co.uk/

The home page

On the home page, you can type a word or a multi-word term in Irish to search for it in the corpus. You will get three kinds of results:

Concordance: You will see a list of examples in which the word or term you searched for is used in a sentence. These examples were selected from the corpus automatically. The home page lists approximately ten examples and you can get more by clicking the “more” link.

Collocations: The box on the right-hand side gives the ten most frequent words that co-occur with the word you searched for. For example, of you search for doras ‘door’, you will get words such as oscail ‘open’, dúnta ‘closed’, plab ‘slam’ and others. Once again, you can see more of these words by clicking the “more” link. This list of collocates has been extracted from the corpus automatically.

Statistics: At the bottom of the page, you will see some statistical data about how the word you searched for is used, such as genre and dialect. For example if you search for fata – one of the words for ‘potato’ – you will see that this word is used almost exclusively in the Connacht dialect. Once again, these statistics have been extracted from the corpus automatically. You can see more statistics by clicking the “more” link.

Advanced searches

If you want to perform more complicated searches on the corpus, you can use the options in the menu on the left-hand side. Here is a summary of the options available.

Concordance: This is where you can search for and list sentences from the corpus based on the words that occur in them. This search is more powerful than the one on the home page; for example, if you select “lemma” in the drop-down box, you can search for all forms of a word: type fuinneog ‘window’ and you will get sentences where any inflected or mutated form of the word occurs: fuinneoige, bhfuinneog and so on. You can also sort and filter the results in several ways.

Word List: This is where you can extract various word lists from the corpus, such as a list of the most frequently occurring words in Irish.

Word Sketch: This section gives you an opportunity to see which words are most frequently used along with the word you are looking for. The results are presented in several lists according to the grammatical relation that exists between the two words. For example, if you search for a verb, you will get one list of its direct objects, another list of its subjects, and so on. Remember that this information was extracted from the corpus automatically and so it may not always be accurate.

Thesaurus: This section allows you to type a word and get a list of other words that are similar to it with respect to their patterns of usage. For example if you search for the adjective folláin ‘healthy, wholesome’, you will get a list that includes sláintiúil ‘healthy’, sábháilte ‘safe’ and others. This is basically a list of words that seem like synonyms because they are used in similar ways. But again, remember that this information was extracted from the corpus automatically and the words you receive may not necessarily be synonyms.

Sketch-Diff: This is a tool for investigating the difference between two words, based on other words that occur with them. If you type two words that are close to each other in meaning, for example leanbh ‘baby’ and páiste ‘child’, you will get information that may help you understand the difference between them: you will see that the words used mainly with leanbh ‘baby’ include saolaigh ‘give birth’ and baist ‘baptize’ while the words used mainly with páiste ‘child’ include múin ‘teach’ and foghlaim ‘learn’.