Agritrop
Accueil

Monte Carlo methods for estimating Mallows's Cp and AIC criteria for PLSR models. Illustration on agronomic spectroscopic NIR data

Lesnoff Matthieu, Roger Jean-Michel, Rutledge Douglas N.. 2021. Monte Carlo methods for estimating Mallows's Cp and AIC criteria for PLSR models. Illustration on agronomic spectroscopic NIR data. Journal of Chemometrics, 35 (10):e3369, 21 p.

Article de revue ; Article de recherche ; Article de revue à facteur d'impact
[img] Version publiée - Anglais
Accès réservé aux personnels Cirad
Utilisation soumise à autorisation de l'auteur ou du Cirad.
lesnoff2021.pdf

Télécharger (4MB) | Demander une copie

Quartile : Q1, Sujet : STATISTICS & PROBABILITY / Quartile : Q2, Sujet : INSTRUMENTS & INSTRUMENTATION / Quartile : Q2, Sujet : MATHEMATICS, INTERDISCIPLINARY APPLICATIONS / Quartile : Q3, Sujet : AUTOMATION & CONTROL SYSTEMS / Quartile : Q3, Sujet : CHEMISTRY, ANALYTICAL / Quartile : Q3, Sujet : COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Résumé : Mallows's Cp and Akaike information criterion (AIC) are common criteria for selecting the dimensionality of regression models, as an alternative to cross- validation (CV) and nonparametric bootstrap. A key parameter in the calcula- tion of Cp and AIC is the effective number of degrees of freedom of the model, or model complexity (d). Parameter d is generally easy to calculate for linear smoothers, that is, models for which the prediction of the training response y is given by by = S y where S is a projector matrix that does not involve y. Never- theless, d is more difficult to estimate for nonlinear smoothers, such as partial least squares regression (PLSR). In this article, we present two algorithms for estimating d for PLSR based on Monte Carlo simulation methods (parametric bootstrap and perturbation analysis) and with the particular case of high dimensional data. We compare these Monte Carlo methods to three other algorithms already published. We used the d estimates to compute Cp and AIC and select PLSR model dimensionalities that we then compare to CV. Two real and heterogeneous agronomic near infrared (NIR) datasets were considered as examples.

Mots-clés Agrovoc : modèle de simulation, modèle mathématique, technique de prévision, critère de sélection, méthode statistique

Mots-clés libres : AIC, Cp, Degrees of freedom, Model complexity, PLSR

Champ stratégique Cirad : CTS 7 (2019-) - Hors champs stratégiques

Agences de financement européennes : European Commission

Auteurs et affiliations

  • Lesnoff Matthieu, CIRAD-ES-UMR SELMET (FRA) ORCID: 0000-0002-5205-9763 - auteur correspondant
  • Roger Jean-Michel, INRAE (FRA)
  • Rutledge Douglas N., Université Paris-Saclay (FRA)

Source : Cirad-Agritrop (https://agritrop.cirad.fr/600014/)

Voir la notice (accès réservé à Agritrop) Voir la notice (accès réservé à Agritrop)

[ Page générée et mise en cache le 2024-12-18 ]