Menya Edmond, Roche Mathieu, Interdonato Roberto, Owuor Dickson.
2024. EpidGPT: A combined strategy to discriminate between redundant and new information for epidemiological surveillance systems.
In : Natural language processing and information systems: 29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Turin, Italy, June 25–27, 2024, Proceedings, Part I. Rapp Amon (ed.), Di Caro Luigi (ed.), Meziane Farid (ed.), Sugumaran Vijayan (ed.)
Version publiée
- Anglais
Accès réservé aux personnels Cirad Utilisation soumise à autorisation de l'auteur ou du Cirad. Menya_et_al_EpidGPT_2024.pdf Télécharger (680kB) | Demander une copie |
Url - jeu de données - Dataverse Cirad : https://doi.org/10.18167/DVN1/WD1UC2
Résumé : Textual documents such as online news articles have become a key source in epidemiological surveillance such as being used in the detection of new and re-emerging diseases. However, such sources suffer redundancies with the need to automate the process of identifying novel information. In this paper, we propose a framework for learning novel thematic information in epidemiological news documents. Our approach involves both extraction and classification of new, duplicate, additional and/or missing pieces of relevant information in epidemiological news documents. Firstly, we propose an initial step to solve the limited data problem where fewer gold labelled datasets exists for training text-based epidemiological surveillance systems. This initial step is built using extractive question answering technique whereby we automate the process of extracting relevant thematic features inclusive of disease and host names, location and date of reported events and reported number of cases in order to create a large silver labelled dataset. We then propose a main step where we build a novelty information classification model that is trained using our large silver labeled dataset. We then test our novelty classifier model alongside competitive ones on the challenge of detecting whether there is novel, redundant and/or missing information in a target epidemiological news article. We later carry out ablation studies on the most informative document segments in epidemiological news articles.
Mots-clés libres : Text Mining, Language Model, Animal disease surveillance
Agences de financement européennes : European Commission
Agences de financement hors UE : Ambassade de France à Nairobi, Direction générale de l'alimentation
Programme de financement européen : H2020
Projets sur financement : (EU) MOnitoring Outbreak events for Disease surveillance in a data science context
Auteurs et affiliations
- Menya Edmond, CIRAD-ES-UMR TETIS (FRA)
- Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
- Interdonato Roberto, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0002-0536-6277
- Owuor Dickson, Strathmore University (KEN)
Source : Cirad-Agritrop (https://agritrop.cirad.fr/610401/)
[ Page générée et mise en cache le 2024-10-06 ]