Agritrop
Accueil

Different methods for determining the dimensionality of multivariate models

Rutledge Douglas N., Roger Jean-Michel, Lesnoff Matthieu. 2021. Different methods for determining the dimensionality of multivariate models. Frontiers in Analytical Science, 1:754447, 15 p.

Article de revue ; Article de recherche ; Article de revue à comité de lecture Revue en libre accès total
[img]
Prévisualisation
Version publiée - Anglais
Sous licence Licence Creative Commons.
rutledge et al frontiers 2021.pdf

Télécharger (3MB) | Prévisualisation

Url - autres données associées : https://github.com/DNRutledge/LV_Criteria

Résumé : A tricky aspect in the use of all multivariate analysis methods is the choice of the number of Latent Variables to use in the model, whether in the case of exploratory methods such as Principal Components Analysis (PCA) or predictive methods such as Principal Components Regression (PCR), Partial Least Squares regression (PLS). For exploratory methods, we want to know which Latent Variables deserve to be selected for interpretation and which contain only noise. For predictive methods, we want to ensure that we include all the variability of interest for the prediction, without introducing variability that would lead to a reduction in the quality of the predictions for samples other than those used to create the multivariate model. In the case of predictive methods such as PLS, the most common procedure to determine the number of Latent Variables for use in the model is Cross Validation which is based on the difference between the vector of observed values, y, and the vector of predicted values, ŷ. In this article, we will first present this procedure and its extensions, and then other methods based on entirely different principles. Many of these methods may also apply to exploratory methods. These alternatives to Cross Validation include methods based on the characteristics of the regression coefficients vectors, such as the Durbin-Watson Criterion, the Morphological Factor, the Variance or Norm and the repeatability of the vectors calculated on random subsets of the individuals. Another group of methods is based on characterizing the structure of the X matrices after each successive deflation. The user is often baffled by the multitude of indicators that are available, since no single criterion (even the classical Cross-Validation) works perfectly in all cases. We propose an empirical method to facilitate the final choice of the number of Latent Variables. A set of indicators is chosen and their evolution as a function of the number of Latent Variables extracted is synthesized by a Principal Components Analysis. The set of criteria chosen here is not exhaustive, and the efficacy of the method could be improved by including others.

Mots-clés Agrovoc : vecteur de maladie, déflation, modèle mathématique, analyse de régression

Mots-clés libres : Multivariate models, Dimensionality, Latent variables, Regression, Cross validation (min5-max 8)

Champ stratégique Cirad : CTS 7 (2019-) - Hors champs stratégiques

Auteurs et affiliations

  • Rutledge Douglas N., Université Paris-Saclay (FRA) - auteur correspondant
  • Roger Jean-Michel, INRAE (FRA)
  • Lesnoff Matthieu, CIRAD-ES-UMR SELMET (FRA) ORCID: 0000-0002-5205-9763

Source : Cirad-Agritrop (https://agritrop.cirad.fr/600016/)

Voir la notice (accès réservé à Agritrop) Voir la notice (accès réservé à Agritrop)

[ Page générée et mise en cache le 2024-04-28 ]