Yet another ranking function for automatic multi-word term extraction

Lossio Ventura Juan Antonio, Jonquet Clément, Roche Mathieu, Teisseire Maguelonne. 2014. Yet another ranking function for automatic multi-word term extraction. In : Proceedings of 9th International Conference on Natural Language Processing (NLP), PolTAL'2014, September 17-19, 2014, Warsaw, Poland. Adam Przepiórkowski, Maciej Ogrodniczuk (eds.). Cham : Springer International Publishing, pp. 52-64. (Lecture Notes in Computer Science, 8686) ISBN 978-319-10887-2 International Conference on Natural Language Processing. 9, Varsovie, Pologne, 17 September 2014/19 September 2014.

Abstract : Term extraction is an essential task in domain knowledge acquisition. We propose two new measures to extract multiword terms from a domain-specific text. The first measure is both linguistic and statistical based. The second measure is graph-based, allowing assessment of the importance of a multiword term of a domain. Existing measures often solve some problems related (but not completely) to term extraction, e.g., noise, silence, low frequency, large-corpora, complexity of the multiword term extraction process. Instead, we focus on managing the entire set of problems, e.g., detecting rare terms and overcoming the low frequency issue. We show that the two proposed measures outperform precision results previously reported for automatic multiword extraction by comparing them with the state-of-the-art reference measures. (Résumé d'auteur)

Classification Agris : C30 - Documentation and information
U30 - Research methods
U10 - Computer science, mathematics and statistics

Auteurs et affiliations

  • Lossio Ventura Juan Antonio, LIRMM (FRA)
  • Jonquet Clément, LIRMM (FRA)
  • Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
  • Teisseire Maguelonne, LIRMM (FRA)

