Dias Hélder, Guimarães Artur, Martins Bruno, Roche Mathieu.
2023. Unsupervised key-phrase extraction from Long texts with multilingual sentence transformers.
In : Discovery science: 26th International Conference, DS 2023 Porto, Portugal, October 9–11, 2023 Proceedings. Bifet Albert (ed.), Lorena Ana Carolina (ed.), Ribeiro Rita P. (ed.), Gama João (ed.), Abreu Pedro H. (ed.)
Version publiée
- Anglais
Accès réservé aux personnels Cirad Sous licence . Dias_DS2023.pdf Télécharger (1MB) | Demander une copie |
Résumé : Key-phrase extraction concerns retrieving a small set of phrases that encapsulate the core concepts of an input textual document. As in other text mining tasks, current methods often rely on pre-trained neural language models. Using these models, the state-of-the-art supervised systems for key-phrase extraction require large amounts of labelled data and generalize poorly outside the training domain, while unsupervised approaches generally present a lower accuracy. This paper presents a multilingual unsupervised approach to key-phrase extraction, improving upon previous methods in several ways (e.g., using representations from pre-trained Transformer models, while supporting the processing of long documents). Experimental results on datasets covering multiple languages and domains attest to the quality of the results.
Mots-clés libres : Text Mining, Key-phrase extraction, Multilingual text processing, Transformers
Agences de financement européennes : European Commission
Programme de financement européen : H2020
Projets sur financement : (EU) MOnitoring Outbreak events for Disease surveillance in a data science context
Auteurs et affiliations
- Dias Hélder, INESC (PRT)
- Guimarães Artur, INESC (PRT)
- Martins Bruno, INESC (PRT)
- Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
Source : Cirad-Agritrop (https://agritrop.cirad.fr/609244/)
[ Page générée et mise en cache le 2024-04-22 ]