Valentin Sarah, Decoupes Rémy, Lancelot Renaud, Roche Mathieu. 2023. Animal disease surveillance: How to represent textual data for classifying epidemiological information. Preventive Veterinary Medicine, 216:105932, 9 p.
|
Version publiée
- Anglais
Utilisation soumise à autorisation de l'auteur ou du Cirad. Valentin_et_al_PVM2023.pdf Télécharger (2MB) | Prévisualisation |
Résumé : The value of informal sources in increasing the timeliness of disease outbreak detection and providing detailed epidemiological information in the early warning and preparedness context is recognized. This study evaluates machine learning methods for classifying information from animal disease-related news at a fine-grained level (i.e., epidemiological topic). We compare two textual representations, the bag-of-words method and a distributional approach, i.e., word embeddings. Both representations performed well for binary relevance classification (F-measure of 0.839 and 0.871, respectively). Bag-of-words representation was outperformed by word embedding representation for classifying sentences into fine-grained epidemiological topics (F-measure of 0.745). Our results suggest that the word embedding approach is of interest in the context of low-frequency classes in a specialized domain. However, this representation did not bring significant performance improvements for binary relevance classification, indicating that the textual representation should be adapted to each classification task.
Mots-clés Agrovoc : épidémiologie, classification, maladie des animaux, surveillance épidémiologique, analyse de données, santé animale, apprentissage machine
Mots-clés libres : Event-based surveillance, Epidemic intelligence, Animal disease surveillance, Text Mining, Word embedding
Classification Agris : L73 - Maladies des animaux
C30 - Documentation et information
U10 - Informatique, mathématiques et statistiques
Champ stratégique Cirad : CTS 4 (2019-) - Santé des plantes, des animaux et des écosystèmes
Agences de financement européennes : European Commission
Agences de financement hors UE : Direction générale de l'alimentation, Centre de Coopération Internationale en Recherche Agronomique pour le Développement, Agence Nationale de la Recherche
Programme de financement européen : H2020
Projets sur financement : (EU) MOnitoring Outbreak events for Disease surveillance in a data science context, (FRA) Institut Convergences en Agriculture Numérique
Auteurs et affiliations
- Valentin Sarah, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0002-9028-681X
- Decoupes Rémy, INRAE (FRA)
- Lancelot Renaud, CIRAD-BIOS-UMR ASTRE (REU)
- Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568 - auteur correspondant
Source : Cirad-Agritrop (https://agritrop.cirad.fr/605087/)
[ Page générée et mise en cache le 2024-11-18 ]