Jchemo: Chemometrics and machine learning on high-dimensional data with Julia

Lesnoff Matthieu. 2024. Jchemo: Chemometrics and machine learning on high-dimensional data with Julia. In : Résumés des communications présentées aux 25èmes rencontres HélioSPIR, Montpellier (France), 11-12 juin 2024. Bastianelli Denis (ed.), Gilles Chaix (ed.). HélioSPIR. Montpellier : Association HélioSPIR, Résumé, p. 36. Rencontres HélioSPIR. 25, Montpellier, France, 11 Juin 2024/12 Juin 2024.

Communication par affiche

Prévisualisation

Version publiée - Anglais
Sous licence

.
610068.pdf
Télécharger (813kB) | Prévisualisation

Résumé : Julia (https://julialang.org) is a programming language designed for high performance. It is an open source project made available under the MIT license. The language tries to tackle the “two-language problem” referring to the fact that many scientific codes are prototyped in a slow but flexible language (to test an idea quickly) but then have to be moved to a faster (e.g. C++) but less flexible language for practical applications. Julia allows fast computations with simple and easily readable coding. Works on Julia began in 2009. Julia's syntax is now considered stable, since version 1.0 in 2018 (actual version June 2024: 1.10.4), with many registered available packages and a very active users' forum (https://discourse.julialang.org). The proposed poster will present Jchemo [1] (https://github.com/mlesnoff/Jchemo.jl), a Julia package (tool-box) dedicated to chemometrics and machine learning in general. • Why did I decide to switch in 2021 from the language R to Julia for my chemometrics works? Trying to run a PLSR (25 LVs) with n = 1e6 samples and p = 500 variables with my R function crashed systematically my working session (with a I9 Intel processor). With the same computer and function but written in Julia, the computation took 8 seconds. • Why did I choose Julia compared to Matlab? Since Julia is free. Jchemo was initially dedicated to partial least squares regression (PLSR) and discrimination (PLSDA) models and their extensions, in particular locally weighted PLS models (kNN-LWPLS-R & -DA). The package has then been expanded to various dimension reduction and regression/discrimination models. Beside usual chemometrics methods (signal preprocessing, PCA, PLS etc.), multi-block methods are available for dimension reduction (e.g. MBPCA, ComDim, rCCA, etc.) and regression/discrimination (MBPLS, ROSAPLS, SOPLS, etc.). Various ridge and sparse models are proposed as well as many nonlinear models useful for modeling heterogeneous data (kernel latent variables/ridge, kNN, RF, SVM). The syntax of Jchemo is very consistent between all the functions and therefore can be learned and used by non-specialists of programming.

Mots-clés libres : Chimiométrie, Machine learning, Package, Langage Julia

Auteurs et affiliations