Lesnoff Matthieu.
2024. Jchemo: Chemometrics and machine learning on high-dimensional data with Julia.
In : Résumés des communications présentées aux 25èmes rencontres HélioSPIR, Montpellier (France), 11-12 juin 2024. Bastianelli Denis (ed.), Gilles Chaix (ed.). HélioSPIR
|
Version publiée
- Anglais
Sous licence . 610068.pdf Télécharger (813kB) | Prévisualisation |
Résumé : Julia (https://julialang.org) is a programming language designed for high performance. It is an open source project made available under the MIT license. The language tries to tackle the “two-language problem” referring to the fact that many scientific codes are prototyped in a slow but flexible language (to test an idea quickly) but then have to be moved to a faster (e.g. C++) but less flexible language for practical applications. Julia allows fast computations with simple and easily readable coding. Works on Julia began in 2009. Julia's syntax is now considered stable, since version 1.0 in 2018 (actual version June 2024: 1.10.4), with many registered available packages and a very active users' forum (https://discourse.julialang.org). The proposed poster will present Jchemo [1] (https://github.com/mlesnoff/Jchemo.jl), a Julia package (tool-box) dedicated to chemometrics and machine learning in general. • Why did I decide to switch in 2021 from the language R to Julia for my chemometrics works? Trying to run a PLSR (25 LVs) with n = 1e6 samples and p = 500 variables with my R function crashed systematically my working session (with a I9 Intel processor). With the same computer and function but written in Julia, the computation took 8 seconds. • Why did I choose Julia compared to Matlab? Since Julia is free. Jchemo was initially dedicated to partial least squares regression (PLSR) and discrimination (PLSDA) models and their extensions, in particular locally weighted PLS models (kNN-LWPLS-R & -DA). The package has then been expanded to various dimension reduction and regression/discrimination models. Beside usual chemometrics methods (signal preprocessing, PCA, PLS etc.), multi-block methods are available for dimension reduction (e.g. MBPCA, ComDim, rCCA, etc.) and regression/discrimination (MBPLS, ROSAPLS, SOPLS, etc.). Various ridge and sparse models are proposed as well as many nonlinear models useful for modeling heterogeneous data (kernel latent variables/ridge, kNN, RF, SVM). The syntax of Jchemo is very consistent between all the functions and therefore can be learned and used by non-specialists of programming.
Mots-clés libres : Chimiométrie, Machine learning, Package, Langage Julia
Auteurs et affiliations
- Lesnoff Matthieu, CIRAD-ES-UMR SELMET (FRA) ORCID: 0000-0002-5205-9763
Autres liens de la publication
Source : Cirad-Agritrop (https://agritrop.cirad.fr/610068/)
[ Page générée et mise en cache le 2024-07-23 ]