Towards electronic SMS dictionary construction: an alignment-based approach

Lopez Cédric, Bestandji Reda, Roche Mathieu, Panckhurst Rachel. 2014. Towards electronic SMS dictionary construction: an alignment-based approach. In : Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC'14): ISO Workshop on interopable semantic annotation, 2014, may 26-31, Reykjavik, Iceland. Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, B. Paris : ELRA, pp. 2833-2838. International Conference on Language Resources and Evaluation. 9, Reykjavik, Islande, 26 May 2014/31 May 2014.

Paper with proceedings
Published version - Anglais
License Licence Creative Commons.

Télécharger (489kB) | Preview

Abstract : In this paper, we propose a method for aligning text messages (entitled AlignSMS) in order to automatically build an SMS dictionary. An extract of 100 text messages from the 88milSMS corpus (Panckhurst el al., 2013, 2014) was used as an initial test. More than 90,000 authentic text messages in French were collected from the general public by a group of academics in the south of France in the context of the sud4science project ( This project is itself part of a vast international SMS data collection project, entitled sms4science (, Fairon et al. 2006, Cougnon, 2014). After corpus collation, pre-processing and anonymisation (Accorsi et al., 2012, Patel et al., 2013), we discuss how "raw" anonymised text messages can be transcoded into normalised text messages, using a statistical alignment method. The future objective is to set up a hybrid (symbolic/statistic) approach based on both grammar rules and our statistical AlignSMS method. (Résumé d'auteur)

Classification Agris : C30 - Documentation and information

Auteurs et affiliations

  • Lopez Cédric, VISEO (FRA)
  • Bestandji Reda, LIRMM (FRA)
  • Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
  • Panckhurst Rachel, CNRS (FRA)

Source : Cirad - Agritrop (

View Item (staff only) View Item (staff only)

[ Page générée et mise en cache le 2019-10-07 ]