Zenasni Sarah, Kergosien Eric, Roche Mathieu, Teisseire Maguelonne.
2016. Extracting new spatial entities and relations from short messages.
In : Proceedings of the 8th International Conference on Management of Digital EcoSystems. Chbeir Richard (ed.)
Version publiée
- Anglais
Accès réservé aux personnels Cirad Utilisation soumise à autorisation de l'auteur ou du Cirad. Zenasni_MEDES2016.pdf Télécharger (899kB) | Demander une copie |
Url - jeu de données - Entrepôt autre : https://repository.ortolang.fr/api/content/comere/v3.2/cmr-88milsms.html / Url - jeu de données - Dataverse Cirad : https://doi.org/10.18167/DVN1/0ZGJRC
Résumé : In the past few years, texts have become an important spatial data resource, in addition to maps, satellite images and GPS. Electronic written texts used in mediated interactions, especially short messages (SMS, tweets, etc.), have triggered the emergence of new ways of writing. Extracting information from such short messages, which represent a rich source of information and opinion, is highly important due to the new and challenging text style. Short messages are, however, difficult to analyze because of their brief, unstructured and informal nature. The work presented in this paper is aimed at extracting spatial information from two authentic corpora of SMS and tweets in French in order to take advantage of the vast amount of geographical knowledge expressed in diverse natural language texts. We propose a process in which, firstly, we extract new spatial entities (e.g. Monpelier, Montpel are associated with the place name Montpellier). Secondly, we identify new spatial relations that precede these spatial entities (e.g. sur, par, etc.). Finally, we propose a general pattern for discovering spatial relations (e.g. SR+ Preposition). The task is very challenging and complex due to the specificity of short messages language, which is based on weakly standardized modes of writing (lexical creation, massive use of abbreviations, textual variants, etc.). The experiments that were carried out on the two corpora 88milSMS and Tweets highlight the efficiency of our proposed strategy for identifying new kinds of spatial entities and relations.
Mots-clés libres : Text mining, Natural language processing, Spatial entities, Spatial relations, Similarity measure, POS, Tagging, SMS corpus, Tweet corpus
Classification Agris : C30 - Documentation et information
B10 - Géographie
U30 - Méthodes de recherche
000 - Autres thèmes
Auteurs et affiliations
- Zenasni Sarah, CIRAD-ES-UMR TETIS (FRA)
- Kergosien Eric, LIRMM (FRA)
- Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
- Teisseire Maguelonne, IRSTEA (FRA)
Source : Cirad-Agritrop (https://agritrop.cirad.fr/582633/)
[ Page générée et mise en cache le 2023-10-21 ]