Agritrop
Accueil

Shallow text clustering does not mean weak topics: how topic identification can leverage bigram feature

Velcin Julien, Roche Mathieu, Poncelet Pascal. 2016. Shallow text clustering does not mean weak topics: how topic identification can leverage bigram feature. In : Proceedings of the 3rd International workshop, DMNLP 2016. Cellier Peggy (ed.), Charnois Thierry (ed.), Hotho Andreas (ed.), Matwin Stan (ed.), Moens Marie-Francine (ed.), Toussaint Yannick (ed.). s.l. : CEUR-WS, 33-40. (CEUR Workshop Proceedings, 1646) Interactions between data mining and natural language processing. 3, Riva del Garda, Italie, 23 Septembre 2016/23 Septembre 2016.

Communication avec actes
[img]
Prévisualisation
Version publiée - Anglais
Utilisation soumise à autorisation de l'auteur ou du Cirad.
DMNLP16_paper4.pdf

Télécharger (459kB) | Prévisualisation

Résumé : Text clustering and topic learning are two closely related tasks. In this paper, we show that the topics can be learnt without the absolute need of an exact categorization. In particular, the experiments performed on two real case studies with a vocabulary based on bigram features lead to extracting readable topics that cover most of the documents. Precision at 10 is up to 74% for a dataset of scientific abstracts with 10,000 features, which is 4% less than when using unigrams only but provides more interpretable topics.

Mots-clés libres : Text clustering, Topic identification, Natural language processing

Classification Agris : C30 - Documentation et information
U30 - Méthodes de recherche

Auteurs et affiliations

  • Velcin Julien, Université de Lyon (FRA)
  • Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
  • Poncelet Pascal, LIRMM (FRA)

Source : Cirad-Agritrop (https://agritrop.cirad.fr/581229/)

Voir la notice (accès réservé à la Dist) Voir la notice (accès réservé à la Dist)

[ Page générée et mise en cache le 2022-04-18 ]