Rabatel Julien, Arsevska Elena, Roche Mathieu. 2019. PADI-web corpus: Labeled textual data in animal health domain. Data in Brief, 22 : 643-646.
|
Version publiée
- Anglais
Sous licence . Rabatel_et_al_DataInBrief_2019.pdf Télécharger (133kB) | Prévisualisation |
Url - jeu de données - Dataverse Cirad : https://doi.org/10.18167/DVN1/KMTIFG
Résumé : Monitoring animal health worldwide, especially the early detection of outbreaks of emerging pathogens, is one of the means of preventing the introduction of infectious diseases in countries (Collier et al., 2008). In this context, we developed PADI-web, a Platform for Automated extraction of animal Disease Information from the Web (Arsevska et al., 2016, 2018). PADI-web is a text-mining tool that automatically detects, categorizes and extracts disease outbreak information from Web news articles. PADI-web currently monitors the Web for five emerging animal infectious diseases, i.e., African swine fever, avian influenza including highly pathogenic and low pathogenic avian influenza, foot-and-mouth disease, bluetongue, and Schmallenberg virus infection. PADI-web collects Web news articles in near-real time through RSS feeds. Currently, PADI-web collects disease information from Google News because of its international and multiple language coverage. We implemented machine learning techniques to identify the relevant disease information in texts (i.e., location and date of an outbreak, affected hosts, their numbers and clinical signs). In order to train the model for Information Extraction (IE) from news articles, a corpus in English has been manually labeled by domain experts. This labeled corpus (Rabatel et al., 2017) is presented in this data paper.
Mots-clés Agrovoc : santé animale, fouille de textes, surveillance épidémiologique, analyse de données
Mots-clés complémentaires : données textuelles
Mots-clés libres : Named entity recognition, Natural language processing, Text mining, Animal disease surveillance
Classification Agris : L73 - Maladies des animaux
C30 - Documentation et information
Champ stratégique Cirad : CTS 4 (2019-) - Santé des plantes, des animaux et des écosystèmes
Auteurs et affiliations
- Rabatel Julien
- Arsevska Elena, CIRAD-BIOS-UMR ASTRE (FRA) ORCID: 0000-0002-6693-2316
- Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568 - auteur correspondant
Source : Cirad-Agritrop (https://agritrop.cirad.fr/590515/)
[ Page générée et mise en cache le 2024-09-09 ]