Dataset for automatic helpdesk response generation

The files extract*.txt.gz are three extracts from the MS Access database, exported as text files. Each item is preceded by a separator string “|——>” , and contains the following fields:
–  “direction” (ingoing or outgoing)
–  “objid” (email identifier)
–  “communication2dialogue” (dialogue identifier)
–  “from_address” (email address – disguised by HP)
–  “creation_time” (date and time)
–  “title” (email subject)
–  “text” (the body of the email)

The file is a conversion of the original MS SQL backup file into a MS Access database. This conversion was carried out by the Technical Support Unit at the Faculty of  Information Technology, Monash University.

Please cite the following paper if you make use of this dataset:|
An Empirical Study of Corpus-based Response Automation Methods for an Email-based Help-desk Domain, Yuval Marom and Ingrid Zukerman. Computational Linguistics 35(4), 1-39, 2009.