Integration of linguistic and web information to improve biomedical terminology extraction

Lossio Ventura Juan Antonio, Jonquet Clément, Roche Mathieu, Teisseire Maguelonne. 2014. Integration of linguistic and web information to improve biomedical terminology extraction. In : Proceedings of the 18th International Database Engineeging and Applications Symposium. ACM. New-York : ACM, pp. 265-269. ISBN 978-1-4503-2627-8 International Database Engineering and Applications Symposium. 18, Porto, Portugal, 7 July 2014/9 July 2014.

Paper with proceedings
[img] Published version - Anglais
Access restricted to CIRAD agents
Use under authorization by the author or CIRAD.

Télécharger (184kB)

Abstract : Comprehensive terminology is essential for a community to describe, exchange, and retrieve data. In multiple domain, the explosion of text data produced has reached a level for which automatic terminology extraction and enrichment is mandatory. Automatic Term Extraction (or Recognition) methods use natural language processing to do so. Methods featuring linguistic and statistical aspects as often proposed in the literature, solve some problems related to term extraction as low frequency, complexity of the multi-word term extraction, human effort to validate candidate terms. In contrast, we present two new measures for extracting and ranking muli-word terms from domain-specific corpora, covering the all mentioned problems. In addition we demonstrate how the use of the Web to evaluate the significance of a multi-word term candidate, helps us to outperform precision results obtain on the biomedical GENIA corpus with previous reported measures such as C-value. (Résumé d'auteur)

Classification Agris : C30 - Documentation and information
000 - Other themes

Auteurs et affiliations

  • Lossio Ventura Juan Antonio, LIRMM (FRA)
  • Jonquet Clément, LIRMM (FRA)
  • Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
  • Teisseire Maguelonne, LIRMM (FRA)

Source : Cirad - Agritrop (

View Item (staff only) View Item (staff only)

[ Page générée et mise en cache le 2021-02-28 ]