Agritrop
Accueil

Improving attribute exploration for the detection and correction of anomalies in an agroecological knowledge base

Saab Nassif, Huchard Marianne, Martin Pierre. 2022. Improving attribute exploration for the detection and correction of anomalies in an agroecological knowledge base. In : Actes des posters et démos : JOBIM 2022. SFBI, IFB, GDR BIM. Rennes : SFBI, Résumé, p. 37. Journées ouvertes en biologie, informatique et mathématiques (JOBIM 2022), Rennes, France, 5 Juillet 2022/8 Juillet 2022.

Communication avec actes
[img]
Prévisualisation
Version publiée - Anglais
Utilisation soumise à autorisation de l'auteur ou du Cirad.
ID601971.pdf

Télécharger (385kB) | Prévisualisation
[img] Version publiée - Anglais
Accès réservé aux personnels Cirad
Utilisation soumise à autorisation de l'auteur ou du Cirad.
JOBIM_poster_Nassif.pdf

Télécharger (915kB) | Demander une copie

Matériel d'accompagnement : 1 poster

Résumé : Data cleaning is crucial to the knowledge discovery process. Knowledge bases such as Knomana [1] rely on data wrangling to standardise and subsequently centralise information extracted from multiple sources. This makes Knomana prone to anomalies, i.e. to incorrect or incomplete descriptions of plant use, which may cause its users to draw wrong conclusions during knowledge discovery. To detect and correct these anomalies, we propose using Attribute Exploration (AE) [2] to acquire expert knowledge and apply it to identify anomalies and correct or complete the descriptions. It is a process of Formal Concept Analysis, which considers data tables describing binary relationships between objects and attributes. AE relies on the computation of the Duquenne-Guigues basis, a complete, consistent and nonredundant set of implication rules, i.e. regularities of the form “if there is X, then there is always Y” [3]. The expert is asked to validate the generated implications or provide a counterexample when an invalid rule is presented. Tools like ConExp [4] implement AE. With Knomana holding 35 attributes covering over 45,000 descriptions of plant use, the number of computed rules is in the thousands [5]. Therefore, it is consequential to have a pertinent and time-saving order of displaying these rules. To tackle the problem at hand, this poster presents an improvement of AE. During AE, the computed rules are consecutively shown to the expert in the lectic order, where set A is presented before set B if the smallest differing element belongs to B. According to this definition, the lectic order does not consider the nature of the data it is addressing, and consequently, the implications are not displayed in a meaningful order, i.e. an order that regards the expert's interest in a particular type of data. Thereupon, we propose that experts sort the data prior to exploring the attributes. By providing experts with the means to group attributes into categories and order them by relevance, table columns are rearranged in conformity with the definition of the lectic order for the purpose of generating the most relevant implications first. Applying this change to a single data table allowed to accommodate AE to the interests of the expert. As a next step, we plan to extend this technique to relational data to render it applicable to datasets that employ ternary relationships, as is the case in the agroecological knowledge base Knomana.

Mots-clés libres : Plante pesticide, Analyse de Concepts Formels, Anomalie

Auteurs et affiliations

  • Saab Nassif, Université de Montpellier (FRA)
  • Huchard Marianne, Université de Montpellier (FRA)
  • Martin Pierre, CIRAD-PERSYST-UPR AIDA (FRA) ORCID: 0000-0002-4874-5795

Source : Cirad-Agritrop (https://agritrop.cirad.fr/601971/)

Voir la notice (accès réservé à la Dist) Voir la notice (accès réservé à la Dist)

[ Page générée et mise en cache le 2022-09-12 ]