Diouf Mansour, Thiam Mohammadou, Roche Mathieu.
2023. New approach to discover meaningful terms to specify cause of death from narratives verbal autopsy using TF-IDF and the LDA topic model.
In : IEEE EUROCON 2023 - 20th International Conference on Smart Technologies. IEEE
Version publiée
- Anglais
Accès réservé aux personnels Cirad Utilisation soumise à autorisation de l'auteur ou du Cirad. Diouf_et_al_ EUROCON2023.pdf Télécharger (271kB) | Demander une copie |
Résumé : Due to a lack of coroners in some remote areas of the world, epidemiological researchers have created a database for collecting causes of death, called a verbal autopsy. The unstructured verbal autopsy (VA) narratives that are collected in this database are full of hidden knowledge about mortality. However, they are under-exploited due to inadequate processing mechanisms, or some of the computational techniques used are inappropriate for the data format. In this paper, we propose an unsupervised approach that is essentially based on a new algorithm for preprocessing such data. This is not only to address the challenge of topic extraction with the Latent Dirichlet Allocation (LDA) topic model in the context of data scarcity, but also to improve the exploitation of topics (causes of death). Experiments with the Population Health Metrics Research Consortium (PHMRC) data have demonstrated the validity of the approach and have led to the identification of reliable causes of death as well as the discovery of new ones.
Mots-clés libres : Text Mining, Natural Language Processing, Topic Modeling
Auteurs et affiliations
- Diouf Mansour, Université de Thiès (SEN)
- Thiam Mohammadou, Université de Thiès (SEN)
- Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
Source : Cirad-Agritrop (https://agritrop.cirad.fr/610870/)
[ Page générée et mise en cache le 2024-11-04 ]