Sketch Engine
  • Login
  • Wiki
  • Timeline
  • View Tickets
  • New Ticket
  • Search
  • Settings

Ticket Navigation

Ticket #60 (new support request)

Opened 11 months ago

adding the Enron corpus

Reported by: neil cooke <n.cooke@surrey.ac.uk> Assigned to: jan
Priority: low Component: Corpus Builder
Version: stable Keywords: Enron corpus
Cc: n.cooke@surrey.ac.uk

Description

I would like to add 2 enron corpus But they are much too large for the personal account.

Is it possible to add it in a way that is available to all? And any advice on best way to present them to your system would be apreacated see details below:

The Raw enron corpus is a Email corpus consisting of about 500,000 emails each in its own file containg the whole email including the emails machine data. Whne expanded it ocupies about 1.3G of disk space.

The enron clean is about 255,000 non duplicate emails, and would consist of the text body parts only. when expanded it occupies about 500M of disk space.

I can merge these so that they are presented to wordsketch as whole users (150 approx) and then zip for upload.

Attachments


Add/Change #60 (adding the Enron corpus)




 

Download in other formats:

  • Comma-delimited Text
  • Tab-delimited Text
  • RSS Feed

Sketch Engine
Bringing Corpora to the Masses

Lexical Computing Ltd

Brought to you by
Lexical Computing Ltd