Agritrop
Accueil

Biomedical term extraction: overview and a new methodology

Lossio Ventura Juan Antonio, Jonquet Clément, Roche Mathieu, Teisseire Maguelonne. 2016. Biomedical term extraction: overview and a new methodology. Information Retrieval, 19 (1) : 59-99.

Article de revue ; Article de recherche ; Article de revue à facteur d'impact
[img] Version Online first - Anglais
Accès réservé aux personnels Cirad
Utilisation soumise à autorisation de l'auteur ou du Cirad.
578084.pdf

Télécharger (1MB) | Demander une copie
[img] Version publiée - Anglais
Accès réservé aux personnels Cirad
Utilisation soumise à autorisation de l'auteur ou du Cirad.
578084.pdf

Télécharger (1MB) | Demander une copie

Url - jeu de données - Dataverse Cirad : https://doi.org/10.18167/DVN1/GQ8DPL / Url - jeu de données - Dataverse Cirad : https://doi.org/10.18167/DVN1/37ENLP

Quartile : Q4, Sujet : COMPUTER SCIENCE, INFORMATION SYSTEMS

Résumé : Terminology extraction is an essential task in domain knowledge acquisition, as well as for information retrieval. It is also a mandatory first step aimed at building/enriching terminologies and ontologies. As often proposed in the literature, existing terminology extraction methods feature linguistic and statistical aspects and solve some problems related (but not completely) to term extraction, e.g. noise, silence, low frequency, large-corpora, complexity of the multi-word term extraction process. In contrast, we propose a cutting edge methodology to extract and to rank biomedical terms, covering all the mentioned problems. This methodology offers several measures based on linguistic, statistical, graphic and web aspects. These measures extract and rank candidate terms with excellent precision: we demonstrate that they outperform previously reported precision results for automatic term extraction, and work with different languages (English, French, and Spanish). We also demonstrate how the use of graphs and the web to assess the significance of a term candidate, enables us to outperform precision results. We evaluated our methodology on the biomedical GENIA and LabTestsOnline corpora and compared it with previously reported measures.

Mots-clés Agrovoc : méthodologie, terminologie, méthode statistique, recherche de l'information, extraction

Mots-clés libres : Automatic term extraction, Biomedical terminology extraction, Natural language processing, BioNLP, Text mining, Web mining, Graphs

Classification Agris : C30 - Documentation et information
U10 - Informatique, mathématiques et statistiques

Champ stratégique Cirad : Hors axes (2014-2018)

Auteurs et affiliations

  • Lossio Ventura Juan Antonio, LIRMM (FRA)
  • Jonquet Clément, LIRMM (FRA)
  • Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
  • Teisseire Maguelonne, LIRMM (FRA)

Source : Cirad-Agritrop (https://agritrop.cirad.fr/578084/)

Voir la notice (accès réservé à Agritrop) Voir la notice (accès réservé à Agritrop)

[ Page générée et mise en cache le 2024-09-16 ]