Agritrop
Accueil

Extracting new spatial entities and relations from short messages

Zenasni Sarah, Kergosien Eric, Roche Mathieu, Teisseire Maguelonne. 2016. Extracting new spatial entities and relations from short messages. In : Proceedings of the 8th International Conference on Management of Digital EcoSystems. Chbeir Richard (ed.). New York : ACM, 189-196. ISBN 978-1-4503-4267-4 International ACM Conference on Management of Digital EcoSystems. 8, Biarritz, France, 1 Novembre 2016/4 Novembre 2016.

Communication avec actes
[img] Version publiée - Anglais
Accès réservé aux personnels Cirad
Utilisation soumise à autorisation de l'auteur ou du Cirad.
Zenasni_MEDES2016.pdf

Télécharger (899kB) | Demander une copie

Url - jeu de données - Entrepôt autre : https://repository.ortolang.fr/api/content/comere/v3.2/cmr-88milsms.html / Url - jeu de données - Dataverse Cirad : https://doi.org/10.18167/DVN1/0ZGJRC

Résumé : In the past few years, texts have become an important spatial data resource, in addition to maps, satellite images and GPS. Electronic written texts used in mediated interactions, especially short messages (SMS, tweets, etc.), have triggered the emergence of new ways of writing. Extracting information from such short messages, which represent a rich source of information and opinion, is highly important due to the new and challenging text style. Short messages are, however, difficult to analyze because of their brief, unstructured and informal nature. The work presented in this paper is aimed at extracting spatial information from two authentic corpora of SMS and tweets in French in order to take advantage of the vast amount of geographical knowledge expressed in diverse natural language texts. We propose a process in which, firstly, we extract new spatial entities (e.g. Monpelier, Montpel are associated with the place name Montpellier). Secondly, we identify new spatial relations that precede these spatial entities (e.g. sur, par, etc.). Finally, we propose a general pattern for discovering spatial relations (e.g. SR+ Preposition). The task is very challenging and complex due to the specificity of short messages language, which is based on weakly standardized modes of writing (lexical creation, massive use of abbreviations, textual variants, etc.). The experiments that were carried out on the two corpora 88milSMS and Tweets highlight the efficiency of our proposed strategy for identifying new kinds of spatial entities and relations.

Mots-clés libres : Text mining, Natural language processing, Spatial entities, Spatial relations, Similarity measure, POS, Tagging, SMS corpus, Tweet corpus

Classification Agris : C30 - Documentation et information
B10 - Géographie
U30 - Méthodes de recherche
000 - Autres thèmes

Auteurs et affiliations

  • Zenasni Sarah, CIRAD-ES-UMR TETIS (FRA)
  • Kergosien Eric, LIRMM (FRA)
  • Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
  • Teisseire Maguelonne, IRSTEA (FRA)

Source : Cirad-Agritrop (https://agritrop.cirad.fr/582633/)

Voir la notice (accès réservé à la Dist) Voir la notice (accès réservé à la Dist)

[ Page générée et mise en cache le 2023-10-21 ]