Agritrop
Accueil

Towards combined semantic and lexical scores based on a new representation of textual data to extract experimental data from scientific publications

Lentschat Martin, Buche Patrice, Dibie-Barthélemy Juliette, Roche Mathieu. 2022. Towards combined semantic and lexical scores based on a new representation of textual data to extract experimental data from scientific publications. International Journal of Intelligent Information and Database Systems, 15 (1), n.spéc. WIMS 2020 Web Intelligence : 78-103.

Article de revue ; Article de recherche ; Article de revue à comité de lecture
[img] Version publiée - Anglais
Accès réservé aux personnels Cirad
Utilisation soumise à autorisation de l'auteur ou du Cirad.
Lentschat_et_al_2022_IJIIDS.pdf

Télécharger (1MB) | Demander une copie

Url - jeu de données - Dataverse Cirad : https://doi.org/10.18167/DVN1/U7HK8J / Url - jeu de données - Dataverse Cirad : https://doi.org/10.18167/DVN1/FC2YXC

Résumé : This article presents an ontological and terminological resource guided process for targeted extraction of scientific experimental data. Our method relies on the scientific publication representation (SciPuRe) describing the extracted data through ontological, lexical and structural (using segments in the scientific documents) features. Relevance scores based on these features are computed to rank the results and filter out the numerous false positives. Linear and sequential combinations of these scores are presented and evaluated. Experiments were carried out on a corpus of 50 English language scientific papers in the food packaging field. They revealed that article segment are an effective criterion for filtering out a majority of the quantitative entity false positives using lexical scores. Moreover the best symbolic entity extraction results were obtained with a sequential combinations of semantic and lexical scores. These results enable the ranking of entities by relevance and the filtering of false positive results.

Mots-clés Agrovoc : fouille de textes, fouille de données, analyse de données, traitement des données, traitement de l'information, conditionnement des aliments, article de revue

Mots-clés libres : Text Mining, Information extraction, Information Retrieval, Data representation, Ontological and Terminological Resource, Ontology, Experimental data

Classification Agris : C30 - Documentation et information
U10 - Informatique, mathématiques et statistiques
Q80 - Conditionnement

Champ stratégique Cirad : CTS 7 (2019-) - Hors champs stratégiques

Auteurs et affiliations

  • Lentschat Martin, CIRAD-ES-UMR TETIS (FRA) - auteur correspondant
  • Buche Patrice, INRAE (FRA)
  • Dibie-Barthélemy Juliette, INRAE (FRA)
  • Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568

Source : Cirad-Agritrop (https://agritrop.cirad.fr/599839/)

Voir la notice (accès réservé à Agritrop) Voir la notice (accès réservé à Agritrop)

[ Page générée et mise en cache le 2024-01-29 ]