Extracting new spatial entities and relations from short messages

Zenasni Sarah, Kergosien Eric, Roche Mathieu, Teisseire Maguelonne. 2016. Extracting new spatial entities and relations from short messages. In : Proceedings of the 8th International Conference on Management of Digital EcoSystems. Chbeir Richard (ed.). New York : ACM, pp. 189-196. ISBN 978-1-4503-4267-4 International ACM Conference on Management of Digital EcoSystems. 8, Biarritz, France, 1 November 2016/4 November 2016.

Paper with proceedings
[img] Published version - Anglais
Access restricted to CIRAD agents
Use under authorization by the author or CIRAD.

Télécharger (899kB) | Request a copy

Url - jeu de données :

Abstract : In the past few years, texts have become an important spatial data resource, in addition to maps, satellite images and GPS. Electronic written texts used in mediated interactions, especially short messages (SMS, tweets, etc.), have triggered the emergence of new ways of writing. Extracting information from such short messages, which represent a rich source of information and opinion, is highly important due to the new and challenging text style. Short messages are, however, difficult to analyze because of their brief, unstructured and informal nature. The work presented in this paper is aimed at extracting spatial information from two authentic corpora of SMS and tweets in French in order to take advantage of the vast amount of geographical knowledge expressed in diverse natural language texts. We propose a process in which, firstly, we extract new spatial entities (e.g. Monpelier, Montpel are associated with the place name Montpellier). Secondly, we identify new spatial relations that precede these spatial entities (e.g. sur, par, etc.). Finally, we propose a general pattern for discovering spatial relations (e.g. SR+ Preposition). The task is very challenging and complex due to the specificity of short messages language, which is based on weakly standardized modes of writing (lexical creation, massive use of abbreviations, textual variants, etc.). The experiments that were carried out on the two corpora 88milSMS and Tweets highlight the efficiency of our proposed strategy for identifying new kinds of spatial entities and relations. (Résumé d'auteur)

Mots-clés libres : Text mining, Natural language processing, Spatial entities, Spatial relations, Similarity measure, POS, Tagging, SMS corpus, Tweet corpus

Classification Agris : C30 - Documentation and information
B10 - Geography
U30 - Research methods
000 - Other themes

Auteurs et affiliations

  • Zenasni Sarah, CIRAD-ES-UMR TETIS (FRA)
  • Kergosien Eric, LIRMM (FRA)
  • Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
  • Teisseire Maguelonne, IRSTEA (FRA)

Source : Cirad-Agritrop (

View Item (staff only) View Item (staff only)

[ Page générée et mise en cache le 2021-02-28 ]