Rutledge Douglas N., Roger Jean-Michel, Lesnoff Matthieu. 2021. Different methods for determining the dimensionality of multivariate models. Frontiers in Analytical Science, 1:754447, 15 p.
|
Version publiée
- Anglais
Sous licence . rutledge et al frontiers 2021.pdf Télécharger (3MB) | Prévisualisation |
Url - autres données associées : https://github.com/DNRutledge/LV_Criteria
Résumé : A tricky aspect in the use of all multivariate analysis methods is the choice of the number of Latent Variables to use in the model, whether in the case of exploratory methods such as Principal Components Analysis (PCA) or predictive methods such as Principal Components Regression (PCR), Partial Least Squares regression (PLS). For exploratory methods, we want to know which Latent Variables deserve to be selected for interpretation and which contain only noise. For predictive methods, we want to ensure that we include all the variability of interest for the prediction, without introducing variability that would lead to a reduction in the quality of the predictions for samples other than those used to create the multivariate model. In the case of predictive methods such as PLS, the most common procedure to determine the number of Latent Variables for use in the model is Cross Validation which is based on the difference between the vector of observed values, y, and the vector of predicted values, ŷ. In this article, we will first present this procedure and its extensions, and then other methods based on entirely different principles. Many of these methods may also apply to exploratory methods. These alternatives to Cross Validation include methods based on the characteristics of the regression coefficients vectors, such as the Durbin-Watson Criterion, the Morphological Factor, the Variance or Norm and the repeatability of the vectors calculated on random subsets of the individuals. Another group of methods is based on characterizing the structure of the X matrices after each successive deflation. The user is often baffled by the multitude of indicators that are available, since no single criterion (even the classical Cross-Validation) works perfectly in all cases. We propose an empirical method to facilitate the final choice of the number of Latent Variables. A set of indicators is chosen and their evolution as a function of the number of Latent Variables extracted is synthesized by a Principal Components Analysis. The set of criteria chosen here is not exhaustive, and the efficacy of the method could be improved by including others.
Mots-clés Agrovoc : vecteur de maladie, déflation, modèle mathématique, analyse de régression
Mots-clés libres : Multivariate models, Dimensionality, Latent variables, Regression, Cross validation (min5-max 8)
Champ stratégique Cirad : CTS 7 (2019-) - Hors champs stratégiques
Auteurs et affiliations
- Rutledge Douglas N., Université Paris-Saclay (FRA) - auteur correspondant
- Roger Jean-Michel, INRAE (FRA)
- Lesnoff Matthieu, CIRAD-ES-UMR SELMET (FRA) ORCID: 0000-0002-5205-9763
Source : Cirad-Agritrop (https://agritrop.cirad.fr/600016/)
[ Page générée et mise en cache le 2024-06-03 ]