Agritrop
Accueil

Unsupervised key-phrase extraction from Long texts with multilingual sentence transformers

Dias Hélder, Guimarães Artur, Martins Bruno, Roche Mathieu. 2023. Unsupervised key-phrase extraction from Long texts with multilingual sentence transformers. In : Discovery science: 26th International Conference, DS 2023 Porto, Portugal, October 9–11, 2023 Proceedings. Bifet Albert (ed.), Lorena Ana Carolina (ed.), Ribeiro Rita P. (ed.), Gama João (ed.), Abreu Pedro H. (ed.). Cham : Springer, 141-155. (Lecture Notes in Computer Science, 14276) ISBN 978-3-031-45274-1 International Conference on Discovery Science (DS 2023). 26, Porto, Portugal, 9 Octobre 2023/11 Octobre 2023.

Communication avec actes
[img] Version publiée - Anglais
Accès réservé aux personnels Cirad
Sous licence Licence Creative Commons.
Dias_DS2023.pdf

Télécharger (1MB) | Demander une copie

Résumé : Key-phrase extraction concerns retrieving a small set of phrases that encapsulate the core concepts of an input textual document. As in other text mining tasks, current methods often rely on pre-trained neural language models. Using these models, the state-of-the-art supervised systems for key-phrase extraction require large amounts of labelled data and generalize poorly outside the training domain, while unsupervised approaches generally present a lower accuracy. This paper presents a multilingual unsupervised approach to key-phrase extraction, improving upon previous methods in several ways (e.g., using representations from pre-trained Transformer models, while supporting the processing of long documents). Experimental results on datasets covering multiple languages and domains attest to the quality of the results.

Mots-clés libres : Text Mining, Key-phrase extraction, Multilingual text processing, Transformers

Agences de financement européennes : European Commission

Programme de financement européen : H2020

Projets sur financement : (EU) MOnitoring Outbreak events for Disease surveillance in a data science context

Auteurs et affiliations

  • Dias Hélder, INESC (PRT)
  • Guimarães Artur, INESC (PRT)
  • Martins Bruno, INESC (PRT)
  • Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568

Source : Cirad-Agritrop (https://agritrop.cirad.fr/609244/)

Voir la notice (accès réservé à Agritrop) Voir la notice (accès réservé à Agritrop)

[ Page générée et mise en cache le 2024-04-22 ]