Last modified 7 days ago
2012
- 14 May 2012
- We are looking for a salesperson. Full advertisement here
- 7 May 2012
- We are finalizing new features in beta. You are welcome to test the new features at https://beta.sketchengine.co.uk. The changes you will see include:
- multiword sketches
- searching in parallel corpora
- shortcut for switching sort order (salience/frequencies) in Word Sketches
- interface language switch (Chinese, Czech, English, Irish)
- We have also build the following new TenTen scaled web corpora:
- arTenTen (Arabic, 5.8 G words)
- esAmTenTen (American Spanish, 7.5 G words, word sketches available, going to be merged with the European Spanish TenTen)
- czTenTen2 (Czech, 4.8 G words)
- frTenTen (French, 10.7 G words, word sketches available)
- jpTenTen (Japanese, 9.1 G words
- ruTenTen (Russian, 15.8 G words, word sketches available)
- We are finalizing new features in beta. You are welcome to test the new features at https://beta.sketchengine.co.uk. The changes you will see include:
- 29 February 2012
- Last day with the company for both Jan Pomikalek and Diana McCarthy. After all the wonderful work they have done and the pleasure it has been to work with them, we are sad to bid them farewell. We wish them both the very best for the future
- 20 February
- Conference and workshop papers accepted:
- WAC-7 Web as Corpus Workshop, Lyon, April 2012
- Vit Suchomel and Jan Pomikalek: Efficient Web Crawling for Large Text Corpora
- LREC Language Resources and Evaluation Conference, Istanbul, May 2012
- Bharat Ram Ambati, Siva Reddy, Adam Kilgarriff: Word Sketches for Turkish
- EURALEX, European Lexicography Conference, Oslo, August 2012
- Milos Jakubicek, Adam Kilgarriff, Pavel Rychly, Vojtech Kovar: Finding Multiwords of More Than Two Words
- Adam Kilgarriff, Jan Pomikalek, Pete Whitelock: Setting up for Corpus Lexicography
- Diana McCarthy, Avinesh PVS, Dominic Glennon: Domain Specific Corpora from the Web
- WAC-7 Web as Corpus Workshop, Lyon, April 2012
- Conference and workshop papers accepted:
- 31 January 2012
- BBC Radio 2's Chris Evans and Oxford University Press join forces to explore children's writing - with Sketch Engine as the back-end technology: radio interview here, 19.10 minutes in
- 30 January 2012
- Digital Languages: Using corpora for your research questions, workshop led by Adam Kilgarriff at the University of Sussex
- Hindi word sketches now available
- 16-17 January 2012
- Adam Kilgarriff presented the Sketch Engine at University of Heidelberg, Departments of English and of Translation
- Featured in Juliette Scott's blog, here
- 11 January 2012
- You can now upload multiple files in an archive when uploading your own corpus to Sketch Engine. See the relevant help page
- 6 January 2012
- An enthusiastic blogger we have come across is Anth of Operative Words. Thank you Silvia Bernardini and Juliette Scott, for both (independently) spotting it
- A brand new, web-crawled, 2-billion word Chinese corpus, zhTenTen, is now available
2011
- 25 November 2011
- We fixed a bug in WebBootCaT which caused problems with using the service in Internet Explorer.
- 11 November 2011
- Siva Reddy and Diana McCarthy, both from Lexical Computing Ltd., are co-authors with Ioannis Klapaftis and Suresh Manandhar in a paper that won best paper award at the 5th International Joint Conference on Natural Language Processing. The paper is listed in the Sketch Engine Bibliography and uses data from Sketch Engine for modelling the semantics of compound nouns
- 27 October 2011
- An RSS feed for Sketch Engine news is now available at http://www.sketchengine.co.uk/rss.cgi
- 7 July 2011
- Opening of the 2011 LSA Linguistics Institute in Boulder, sponsored by LCL, in Boulder, Colorado. All participants receive one year's Sketch Engine account.
- Six 'Brown Family' corpora available in Sketch Engine, supporting comparisons across genre, time and dialect
- original Brown (US 1961), LOB (UK 1961), BLOB (UK, 1931), FLOB (UK, 1991), FROWN (US, 1991), BrE06 (UK, 2006)
- 1 July
- Diana McCarthy, LCL Director and Erasmus Mundi Fellow, visiting Melbourne University for a month.
- 29 June 2011
- Sketch Engine workshop in Taipei, Taiwan, hosted by Bookman Books with speakers Wallace Chen, Jerome Su, Howard Chang
- 15 June
- CHILDES/TalkBank data (parent-child dialogs) available in the Sketch Engine, for multiple languages. 23m words for English.
- 2 June 2011
- The UK National Ecosystem ( http://uknea.unep-wcmc.org/) launches its Synthesis Report with its key findings. Sketch Engine was used for the corpus linguistics analysis. See page 41 of the report which can be obtained here
- 7 May 2011
- GDEX (Good Dictionary EXamples): infrastructure now set up so customers can develop their own GDEX (eg for a different language/publisher/dictionary). Documentation at https://trac.sketchengine.co.uk/wiki/GDEX
- 30 April 2011
- First version of CCBC (Comparable Corpora BootCaT) and bilingual word sketches presented at 'Research Models in Translations Theory' conference, Manchester. Powerpoint here
- 27 April 2011
- Web corpus tools developed by Jan Pomikalek, for his PhD and within PRESEMT, made available:
- jusText, for web page cleaning including removing boilerplate, http://code.google.com/p/justext/
- Onion, for deduplication, http://code.google.com/p/onion/
- Web corpus tools developed by Jan Pomikalek, for his PhD and within PRESEMT, made available:
- 25 April 2011
- First version of bilingual word sketches prepared
- 15 April 2011
- Polish word sketches available
- 30 March 2011
- First version of Bulgarian word sketches available; SkE going into use at the Institute for the Bulgarian Language, Sofia
- 22 March 2011
- New front pages (including this news page) go live
- 17 March 2011
- New improved wordlist functionality and sketch diffs by subcorpus available on beta
- 16-17 March 2011
- SKEW-2; 2nd International Sketch Engine Workshop has been held in Brighton
- 11 March 2011
- we now own our own servers (as well as renting some) and shall be shifting services to the owned servers
- Users can now upload their own aligned, parallel corpora within Corpus Architect
- 11 March 2011
- paper to appear in Corpus Linguistics 2011 on our work on CLAEVIPS: A Corpus Linguistic Analysis of Ecosystems Vocabulary in the Public Sphere. Commissioned by the UK National Ecosystem Assessment
- testPage?
Attachments
-
CCBC_Manchester.ppt
(2.5 MB) -
added by ak 13 months ago.
