Borovikova Mariya, Ferré Arnaud, Bossy Robert, Roche Mathieu, Nédellec Claire.
2023. Could key word masking strategy improve language model?.
In : Natural language processing and information systems: 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023, Derby, UK, June 21–23, 2023, Proceedings. Métais Elisabeth (ed.), Meziane Farid (ed.), Sugumaran Vijayan (ed.) , Manning Warren (ed.) , Reiff-Marganiec Stephan (ed.)
Version publiée
- Anglais
Accès réservé aux personnels Cirad Utilisation soumise à autorisation de l'auteur ou du Cirad. Borovikova_et_al_NLDB2023.pdf Télécharger (1MB) | Demander une copie |
Url - jeu de données - Entrepôt autre : https://doi.org/10.57745/HVPITE
Résumé : This paper presents an enhanced approach for adapting a Language Model (LM) to a specific domain, with a focus on Named Entity Recognition (NER) and Named Entity Linking (NEL) tasks. Traditional NER/NEL methods require a large amounts of labeled data, which is time and resource intensive to produce. Unsupervised and semi-supervised approaches overcome this limitation but suffer from a lower quality. Our approach, called KeyWord Masking (KWM), fine-tunes a Language Model (LM) for the Masked Language Modeling (MLM) task in a special way. Our experiments demonstrate that KWM outperforms traditional methods in restoring domain-specific entities. This work is a preliminary step towards developing a more sophisticated NER/NEL system for domain-specific data.
Mots-clés libres : Natural Language Processing, Named entity recognition, Language Model, BERT, Plant disease surveillance, Epidemiological surveillance
Agences de financement hors UE : Agence Nationale de la Recherche
Projets sur financement : (FRA) Building epidemiological surveillance and prophylaxis with observations both near and distant
Auteurs et affiliations
- Borovikova Mariya, Université Paris-Saclay (FRA)
- Ferré Arnaud, Université Paris-Saclay (FRA)
- Bossy Robert, INRAE (FRA)
- Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
- Nédellec Claire, INRAE (FRA)
Autres liens de la publication
Source : Cirad-Agritrop (https://agritrop.cirad.fr/605120/)
[ Page générée et mise en cache le 2024-04-08 ]