Agritrop
Accueil

A "big-data" algorithm for KNN-PLS

Metz Maxime, Lesnoff Matthieu, Abdelghafour Florent, Akbarinia Reza, Masseglia Florent, Roger Jean-Michel. 2020. A "big-data" algorithm for KNN-PLS. Chemometrics and Intelligent Laboratory Systems, 203:104076, 8 p.

Article de revue ; Article de recherche ; Article de revue à facteur d'impact
[img] Version publiée - Anglais
Accès réservé aux personnels Cirad
Utilisation soumise à autorisation de l'auteur ou du Cirad.
599675.pdf

Télécharger (1MB) | Demander une copie

Quartile : Q1, Sujet : INSTRUMENTS & INSTRUMENTATION / Quartile : Q1, Sujet : MATHEMATICS, INTERDISCIPLINARY APPLICATIONS / Quartile : Q1, Sujet : STATISTICS & PROBABILITY / Quartile : Q2, Sujet : AUTOMATION & CONTROL SYSTEMS / Quartile : Q2, Sujet : CHEMISTRY, ANALYTICAL / Quartile : Q2, Sujet : COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Résumé : A well known issue regarding PLS lies in the difficulty to apprehend nonlinearities. As a solution, an extension of the method, “KNN-PLS”, was developed. However, this solution is based on a neighbourhood selection algorithm whose execution time is highly dependent on the size of the database, leading to prohibitive response times. This article proposes, as an alternative, a new method designed to process large data volumes: “parSketch-PLS”. This method combines a “big-data domain” neighbour selection method, called “parSketch”, and the PLS method. Essentially, this paper presents a feasibility study, regarding the adaptation of big-data principles for spectral datasets, in non-linear contexts. The parSketch method has not been studied in the context of chemometrics and considering the specific properties of spectral data. This method is based on the approximation of sample neighbourhoods, based on spectral distances. It is then necessary to investigate the relevance of these neighbourhoods for PLS models and predictions. This article compares PLS and KNN-PLS methods with the parSketch-PLS method. In this context, PLS allows to process large volumes of data quickly but performs poorly in prediction while the KNN-PLS method returns accurate predictions, yet with much higher computational time. This paper shows that the proposed pairing offers a good operational trade-off between prediction performances and computational cost. In addition a comprehensive study of the input parameters of parSketch-PLS is conducted. The objective is to understand the influence of these parameters on the prediction performances. This article proposes a framework to interpret the neighbourhoods returned by comparing their relative sizes with the evolution of performances and the input parameters of parSketch.

Mots-clés Agrovoc : chimie, analyse de données

Mots-clés complémentaires : big data, Algorithme, chimiométrie, jeu de données

Mots-clés libres : KNN-PLSDA, PLSDA, ParSketch, Big-data, Local-PLS

Classification Agris : U10 - Informatique, mathématiques et statistiques
U50 - Sciences physiques et chimie

Champ stratégique Cirad : CTS 7 (2019-) - Hors champs stratégiques

Auteurs et affiliations

  • Metz Maxime, Montpellier SupAgro (FRA) - auteur correspondant
  • Lesnoff Matthieu, CIRAD-ES-UMR SELMET (FRA) ORCID: 0000-0002-5205-9763
  • Abdelghafour Florent, Université de Montpellier (FRA)
  • Akbarinia Reza, LIRMM (FRA)
  • Masseglia Florent, LIRMM (FRA)
  • Roger Jean-Michel, Montpellier SupAgro (FRA)

Source : Cirad-Agritrop (https://agritrop.cirad.fr/599675/)

Voir la notice (accès réservé à Agritrop) Voir la notice (accès réservé à Agritrop)

[ Page générée et mise en cache le 2024-04-09 ]