Agritrop
Accueil

United we stand: using multiple strategies for topic labeling

Gourru Antoine, Velcin Julien, Roche Mathieu, Gravier Christophe, Poncelet Pascal. 2018. United we stand: using multiple strategies for topic labeling. In : Natural language processing and information systems: 23rd International Conference on Applications of Natural Language to Information Systems, NLDB 2018, Paris, France, June 13-15, 2018, Proceedings. Silberztein Max (ed.), Atigui Faten (ed.), Kornyshova Elena (ed.), Métais Elisabeth (ed.), Meziane Farid (ed.). CNAM. Cham : Springer, 352-363. (Lecture Notes in Computer Science, 10859) ISBN 978-3-319-91946-1 International Conference on Natural Language to Information Systems. 23, Paris, France, 13 Juin 2018/15 Juin 2018.

Communication avec actes
[img] Version publiée - Anglais
Accès réservé aux personnels Cirad
Utilisation soumise à autorisation de l'auteur ou du Cirad.
Gourru_et_al_NLDB_2018.pdf

Télécharger (266kB) | Demander une copie

Résumé : Topic labeling aims at providing a sound, possibly multi-words, label that depicts a topic drawn from a topic model. This is of the utmost practical interest in order to quickly grasp a topic informational content – the usual ranked list of words that maximizes a topic presents limitations for this task. In this paper, we introduce three new unsupervised n-gram topic labelers that achieve comparable results than the existing unsupervised topic labelers but following different assumptions. We demonstrate that combining topic labelers - even only two - makes it possible to target a 64% improvement with respect to single topic labeler approaches and therefore opens research in that direction. Finally, we introduce a fourth topic labeler that extracts representative sentences, using Dirichlet smoothing to add contextual information. This sentence-based labeler provides strong surrogate candidates when n-gram topic labelers fall short on providing relevant labels, leading up to 94% topic covering.

Mots-clés libres : Clustering, Topic Modeling, Text mining

Auteurs et affiliations

  • Gourru Antoine, Université de Lyon (FRA)
  • Velcin Julien, Université de Lyon (FRA)
  • Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
  • Gravier Christophe, Université Jean Monnet (FRA)
  • Poncelet Pascal, LIRMM (FRA)

Autres liens de la publication

Source : Cirad-Agritrop (https://agritrop.cirad.fr/588239/)

Voir la notice (accès réservé à Agritrop) Voir la notice (accès réservé à Agritrop)

[ Page générée et mise en cache le 2024-10-16 ]