Agritrop
Accueil

Towards a bio-inspired approach to match heterogeneous documents

Yahi Nourelhouda, Belhadef Hacene, Roche Mathieu, Draa Amer. 2017. Towards a bio-inspired approach to match heterogeneous documents. In : Proceedings of the 13th International Conference on Web Information Systems and Technologies (WEBIST). Majchrzak Tim A. (ed.), Traverso Paolo (ed.), Krempels Karl-Heinz (ed.), Montfort Valérie (ed.). Lisbonne : Scitepress, 276-283. ISBN 978-989-758-246-2 International Conference on Web Information Systems and Technologies (WEBIST). 13, Porto, Portugal, 25 Avril 2017/27 Avril 2017.

Communication avec actes
[img] Version publiée - Anglais
Accès réservé aux agents Cirad
Utilisation soumise à autorisation de l'auteur ou du Cirad.
Yahi_WEBIST_2017.pdf

Télécharger (1MB) | Demander une copie

Résumé : Matching heterogeneous text documents coming from different sources means matching data extracted from these documents, generally structured in the form of vectors. The accuracy of matching directly depends on the right choice of the content of these vectors. That's why we need to select the best features. In this paper, we present a new approach to select the minimum set of features that represents the semantics of a set of text documents, using a quantum inspired genetic algorithm. Among different Vs characterizing the big data we focus on 'Variety' criterion, therefore, we used three sets of different sources that are semantically similar to retrieve their best features which describe the semantics of the corpus. In the matching phase, our approach shows significant improvement compared with the classic 'Bag-of-words' approach.

Mots-clés libres : Text mining, Heterogeneous data

Classification Agris : U30 - Méthodes de recherche
C30 - Documentation et information
U10 - Informatique, mathématiques et statistiques

Auteurs et affiliations

  • Yahi Nourelhouda, Université Abdelhamid Mehri Constantine 2 (DZA)
  • Belhadef Hacene, Université Abdelhamid Mehri Constantine 2 (DZA)
  • Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
  • Draa Amer, Université Abdelhamid Mehri Constantine 2 (DZA)

Source : Cirad-Agritrop (https://agritrop.cirad.fr/584211/)

Voir la notice (accès réservé à la Dist) Voir la notice (accès réservé à la Dist)

[ Page générée et mise en cache le 2022-03-31 ]