Dias Hélder, Guimarães Artur, Martins Bruno, Roche Mathieu.
2023. Unsupervised key-phrase extraction from Long texts with multilingual sentence transformers.
In : Discovery science: 26th International Conference, DS 2023 Porto, Portugal, October 9–11, 2023 Proceedings. Bifet Albert (ed.), Lorena Ana Carolina (ed.), Ribeiro Rita P. (ed.), Gama João (ed.), Abreu Pedro H. (ed.)
![]() |
Version publiée
- Anglais
Accès réservé aux personnels Cirad Sous licence ![]() Dias_DS2023.pdf Télécharger (1MB) | Demander une copie |
Résumé : Key-phrase extraction concerns retrieving a small set of phrases that encapsulate the core concepts of an input textual document. As in other text mining tasks, current methods often rely on pre-trained neural language models. Using these models, the state-of-the-art supervised systems for key-phrase extraction require large amounts of labelled data and generalize poorly outside the training domain, while unsupervised approaches generally present a lower accuracy. This paper presents a multilingual unsupervised approach to key-phrase extraction, improving upon previous methods in several ways (e.g., using representations from pre-trained Transformer models, while supporting the processing of long documents). Experimental results on datasets covering multiple languages and domains attest to the quality of the results.
Mots-clés libres : Text Mining, Key-phrase extraction, Multilingual text processing, Transformers
Agences de financement européennes : European Commission
Programme de financement européen : H2020
Projets sur financement : (EU) MOnitoring Outbreak events for Disease surveillance in a data science context
Auteurs et affiliations
- Dias Hélder, INESC (PRT)
- Guimarães Artur, INESC (PRT)
- Martins Bruno, INESC (PRT)
-
Roche Mathieu, CIRAD-ES-UMR TETIS (FRA)
ORCID: 0000-0003-3272-8568
Source : Cirad-Agritrop (https://agritrop.cirad.fr/609244/)
[ Page générée et mise en cache le 2024-04-22 ]