Cargando…

So you think you can PLS-DA?

BACKGROUND: Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and...

Descripción completa

Detalles Bibliográficos
Autores principales: Ruiz-Perez, Daniel, Guan, Haibin, Madhivanan, Purnima, Mathee, Kalai, Narasimhan, Giri
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7724830/
https://www.ncbi.nlm.nih.gov/pubmed/33297937
http://dx.doi.org/10.1186/s12859-019-3310-7
_version_ 1783620597425635328
author Ruiz-Perez, Daniel
Guan, Haibin
Madhivanan, Purnima
Mathee, Kalai
Narasimhan, Giri
author_facet Ruiz-Perez, Daniel
Guan, Haibin
Madhivanan, Purnima
Mathee, Kalai
Narasimhan, Giri
author_sort Ruiz-Perez, Daniel
collection PubMed
description BACKGROUND: Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA). RESULTS: We demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input. Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated. Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. All the 3D figures shown in this paper as well as the supplementary ones can be viewed interactively at http://biorg.cs.fiu.edu/plsda CONCLUSIONS: Our results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA for different underlying data models.
format Online
Article
Text
id pubmed-7724830
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77248302020-12-09 So you think you can PLS-DA? Ruiz-Perez, Daniel Guan, Haibin Madhivanan, Purnima Mathee, Kalai Narasimhan, Giri BMC Bioinformatics Research BACKGROUND: Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA). RESULTS: We demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input. Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated. Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. All the 3D figures shown in this paper as well as the supplementary ones can be viewed interactively at http://biorg.cs.fiu.edu/plsda CONCLUSIONS: Our results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA for different underlying data models. BioMed Central 2020-12-09 /pmc/articles/PMC7724830/ /pubmed/33297937 http://dx.doi.org/10.1186/s12859-019-3310-7 Text en © The Author(s) 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ruiz-Perez, Daniel
Guan, Haibin
Madhivanan, Purnima
Mathee, Kalai
Narasimhan, Giri
So you think you can PLS-DA?
title So you think you can PLS-DA?
title_full So you think you can PLS-DA?
title_fullStr So you think you can PLS-DA?
title_full_unstemmed So you think you can PLS-DA?
title_short So you think you can PLS-DA?
title_sort so you think you can pls-da?
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7724830/
https://www.ncbi.nlm.nih.gov/pubmed/33297937
http://dx.doi.org/10.1186/s12859-019-3310-7
work_keys_str_mv AT ruizperezdaniel soyouthinkyoucanplsda
AT guanhaibin soyouthinkyoucanplsda
AT madhivananpurnima soyouthinkyoucanplsda
AT matheekalai soyouthinkyoucanplsda
AT narasimhangiri soyouthinkyoucanplsda