WEIR-P: An information extraction pipeline for the wastewater domain

Chahinian Nanée, Bonnabaud La Bruyère Thierry, Frontini Francesca, Delenne Carole, Julien Marin, Panckhurst Rachel, Roche Mathieu, Sautot Lucile, Deruelle Laurent, Teisseire Maguelonne. 2021. WEIR-P: An information extraction pipeline for the wastewater domain. In : Research challenges in information science: 15th International Conference, RCIS 2021 Limassol, Cyprus, May 11–14, 2021 Proceedings. Cherfi Samira (ed.), Perini Anna (ed.), Nurcan Slemin (ed.). Cham : Springer, 171-188. (Lecture Notes in Business Information Processing, 415) ISBN 978-3-030-75017-6 International Conference on Research Challenges in Information Science (RCIS 2021). 15, Limassol, Chypre, 11 Mai 2021/14 Mai 2021.

https://doi.org/10.1007/978-3-030-75018-3_11

Communication avec actes

Version publiée - Anglais
Accès réservé aux personnels Cirad
Utilisation soumise à autorisation de l'auteur ou du Cirad.
Chahinian_et_al_RCIS_2021.pdf
Télécharger (2MB) | Demander une copie

Url - jeu de données - Entrepôt autre : https://doi.org/10.23708/H0VXH0

Résumé : We present the MeDO project, aimed at developing resources for text mining and information extraction in the wastewater domain. We developed a specific Natural Language Processing (NLP) pipeline named WEIR-P (WastewatEr InfoRmation extraction Platform) which identifies the entities and relations to be extracted from texts, pertaining to information, wastewater treatment, accidents and works, organizations, spatio-temporal information, measures and water quality. We present and evaluate the first version of the NLP system which was developed to automate the extraction of the aforementioned annotation from texts and its integration with existing domain knowledge. The preliminary results obtained on the Montpellier corpus are encouraging and show how a mix of supervised and rule-based techniques can be used to extract useful information and reconstruct the various phases of the extension of a given wastewater network. While the NLP and Information Extraction (IE) methods used are state of the art, the novelty of our work lies in their adaptation to the domain, and in particular in the wastewater management conceptual model, which defines the relations between entities. French resources are less developed in the NLP community than English ones. The datasets obtained in this project are another original aspect of this work.

Mots-clés libres : Wastewater, Text Mining, Information extraction, Natural language processing

Auteurs et affiliations

Chahinian Nanée, IRD (FRA)
Bonnabaud La Bruyère Thierry, Université de Montpellier (FRA)
Frontini Francesca, Istituto di Linguistica Computazionale (ITA)
Delenne Carole, INRIA (FRA)
Julien Marin
Panckhurst Rachel, Université Paul Valéry Montpellier 3 (FRA)
Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
Sautot Lucile, AgroParisTech (FRA)
Deruelle Laurent
Teisseire Maguelonne, INRAE (FRA)

Autres liens de la publication

Ouvrage ou Actes

Source : Cirad-Agritrop (https://agritrop.cirad.fr/598243/)

Voir la notice (accès réservé à Agritrop)

[ Page générée et mise en cache le 2024-03-31 ]

WEIR-P: An information extraction pipeline for the wastewater domain

Téléchargements Agritrop

Altmetrics

Dimensions