Agritrop
Accueil

A novel framework for biomedical entity sense induction

Lossio-Ventura Juan Antonio, Bian J., Jonquet Clément, Roche Mathieu, Teisseire Maguelonne. 2018. A novel framework for biomedical entity sense induction. Journal of Biomedical Informatics, 84 : 31-41.

Article de revue ; Article de recherche ; Article de revue à facteur d'impact
[img] Version publiée - Anglais
Accès réservé aux personnels Cirad
Utilisation soumise à autorisation de l'auteur ou du Cirad.
Lossio_et_al_JBI_2018.pdf

Télécharger (637kB) | Demander une copie

Quartile : Q2, Sujet : COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS / Quartile : Q2, Sujet : MEDICAL INFORMATICS

Résumé : Background: Rapid advancements in biomedical research have accelerated the number of relevant electronic documents published online, ranging from scholarly articles to news, blogs, and user-generated social media content. Nevertheless, the vast amount of this information is poorly organized, making it difficult to navigate. Emerging technologies such as ontologies and knowledge bases (KBs) could help organize and track the information associated with biomedical research developments. A major challenge in the automatic construction of ontologies and KBs is the identification of words with its respective sense(s) from a free-text corpus. Word-sense induction (WSI) is a task to automatically induce the different senses of a target word in the different contexts. In the last two decades, there have been several efforts on WSI. However, few methods are effective in biomedicine and life sciences. Methods: We developed a framework for biomedical entity sense induction using a mixture of natural language processing, supervised, and unsupervised learning methods with promising results. It is composed of three main steps: (1) a polysemy detection method to determine if a biomedical entity has many possible meanings; (2) a clustering quality index-based approach to predict the number of senses for the biomedical entity; and (3) a method to induce the concept(s) (i.e., senses) of the biomedical entity in a given context. Results: To evaluate our framework, we used the well-known MSH WSD polysemic dataset that contains 203 annotated ambiguous biomedical entities, where each entity is linked to 2–5 concepts. Our polysemy detection method obtained an F-measure of 98%. Second, our approach for predicting the number of senses achieved an F-measure of 93%. Finally, we induced the concepts of the biomedical entities based on a clustering algorithm and then extracted the keywords of reach cluster to represent the concept. Conclusions: We have developed a framework for biomedical entity sense induction with promising results. Our study results can benefit a number of downstream applications, for example, help to resolve concept ambiguities when building Semantic Web KBs from biomedical text.

Mots-clés libres : BioNLP, Polysemy, Text mining, Word sense disambiguation, Clustering

Classification Agris : C30 - Documentation et information
U10 - Informatique, mathématiques et statistiques
000 - Autres thèmes

Champ stratégique Cirad : Hors axes (2014-2018)

Auteurs et affiliations

  • Lossio-Ventura Juan Antonio, University of Florida (USA)
  • Bian J., University of Florida (USA) - auteur correspondant
  • Jonquet Clément, LIRMM (FRA)
  • Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
  • Teisseire Maguelonne, IRSTEA (FRA)

Source : Cirad-Agritrop (https://agritrop.cirad.fr/588234/)

Voir la notice (accès réservé à Agritrop) Voir la notice (accès réservé à Agritrop)

[ Page générée et mise en cache le 2024-10-18 ]