Agritrop
Accueil

Biomedical terminology extraction: a new combination of statistical and web mining approaches

Lossio Ventura Juan Antonio, Jonquet Clément, Roche Mathieu, Teisseire Maguelonne. 2014. Biomedical terminology extraction: a new combination of statistical and web mining approaches. In : Proceedings of the 12th international Conference on Textual Data statistical Analysis (JADT 2014), Paris, June, 3-6 2014 = Actes des 12es Journées internationales d'Analyse Statistique des Données Textuelles (JADT 2014), Paris, 3-6 juin 2014. Emilie Née,. s.l. : s.n., 421-432. ISBN 978-2-9547781-1-2 International Conference on Textual Data statistical Analysis. 12, Paris, France, 3 Juin 2014/6 Juin 2014.

Communication avec actes
[img]
Prévisualisation
Version publiée - Anglais
Utilisation soumise à autorisation de l'auteur ou du Cirad.
document_574171.pdf

Télécharger (895kB) | Prévisualisation

Résumé : The objective of this work is to combine statistical and web mining methods for the automatic extraction, and ranking of biomedical terms from free text. We present new extraction methods that use linguistic patterns specialized for the biomedical field, and use term extraction measures, such as C-value, and keyword extraction measures, such as Okapi BM25, and TFIDF. We propose several combinations of these measures to improve the extraction and ranking process and we investigate which combinations are more relevant for different cases. Each measure gives us a ranked list of candidate terms that we finally re-rank with a new web-based measure. Our experiments show, first that an appropriate harmonic mean of C-value used with keyword extraction measures offers better precision results than used alone, either for the extraction of single-word and multi-word terms; second, that best precision results are often obtained when we re-rank using the web-based measure. We illustrate our results on the extraction of English and French biomedical terms from a corpus of laboratory tests available online in both languages. The results are validated by only using UMLS (in English) and MeSH (in French) as reference dictionary.

Classification Agris : C30 - Documentation et information
000 - Autres thèmes

Auteurs et affiliations

  • Lossio Ventura Juan Antonio, LIRMM (FRA)
  • Jonquet Clément, LIRMM (FRA)
  • Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
  • Teisseire Maguelonne, LIRMM (FRA)

Source : Cirad - Agritrop (https://agritrop.cirad.fr/574171/)

Voir la notice (accès réservé à la Dist) Voir la notice (accès réservé à la Dist)

[ Page générée et mise en cache le 2022-04-25 ]