Cargando…

Complex Chemical Data Classification and Discrimination Using Locality Preserving Partial Least Squares Discriminant Analysis

[Image: see text] Partial least squares discriminant analysis (PLS-DA) is a well-known technique for feature extraction and discriminant analysis in chemometrics. Despite its popularity, it has been observed that PLS-DA does not automatically lead to extraction of relevant features. Feature learning...

Descripción completa

Detalles Bibliográficos
Autores principales: Aminu, Muhammad, Ahmad, Noor Atinah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2020
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7581267/
https://www.ncbi.nlm.nih.gov/pubmed/33110988
http://dx.doi.org/10.1021/acsomega.0c03362
Descripción
Sumario:[Image: see text] Partial least squares discriminant analysis (PLS-DA) is a well-known technique for feature extraction and discriminant analysis in chemometrics. Despite its popularity, it has been observed that PLS-DA does not automatically lead to extraction of relevant features. Feature learning and extraction depends on how well the discriminant subspace is captured. In this paper, discriminant subspace learning of chemical data is discussed from the perspective of PLS-DA and a recent extension of PLS-DA, which is known as the locality preserving partial least squares discriminant analysis (LPPLS-DA). The objective is twofold: (a) to introduce the LPPLS-DA algorithm to the chemometrics community and (b) to demonstrate the superior discrimination capabilities of LPPLS-DA and how it can be a powerful alternative to PLS-DA. Four chemical data sets are used: three spectroscopic data sets and one that contains compositional data. Comparative performances are measured based on discrimination and classification of these data sets. To compare the classification performances, the data samples are projected onto the PLS-DA and LPPLS-DA subspaces, and classification of the projected samples into one of the different groups (classes) is done using the nearest-neighbor classifier. We also compare the two techniques in data visualization (discrimination) task. The ability of LPPLS-DA to group samples from the same class while at the same time maximizing the between-class separation is clearly shown in our results. In comparison with PLS-DA, separation of data in the projected LPPLS-DA subspace is more well defined.