Cargando…

Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems

BACKGROUND: Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform var...

Descripción completa

Detalles Bibliográficos
Autores principales: Lê Cao, Kim-Anh, Boitard, Simon, Besse, Philippe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3133555/
https://www.ncbi.nlm.nih.gov/pubmed/21693065
http://dx.doi.org/10.1186/1471-2105-12-253
_version_ 1782207913777954816
author Lê Cao, Kim-Anh
Boitard, Simon
Besse, Philippe
author_facet Lê Cao, Kim-Anh
Boitard, Simon
Besse, Philippe
author_sort Lê Cao, Kim-Anh
collection PubMed
description BACKGROUND: Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. RESULTS: A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. CONCLUSIONS: sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.
format Online
Article
Text
id pubmed-3133555
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31335552011-07-12 Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems Lê Cao, Kim-Anh Boitard, Simon Besse, Philippe BMC Bioinformatics Research Article BACKGROUND: Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. RESULTS: A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. CONCLUSIONS: sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets. BioMed Central 2011-06-22 /pmc/articles/PMC3133555/ /pubmed/21693065 http://dx.doi.org/10.1186/1471-2105-12-253 Text en Copyright ©2011 Lê Cao et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Lê Cao, Kim-Anh
Boitard, Simon
Besse, Philippe
Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems
title Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems
title_full Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems
title_fullStr Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems
title_full_unstemmed Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems
title_short Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems
title_sort sparse pls discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3133555/
https://www.ncbi.nlm.nih.gov/pubmed/21693065
http://dx.doi.org/10.1186/1471-2105-12-253
work_keys_str_mv AT lecaokimanh sparseplsdiscriminantanalysisbiologicallyrelevantfeatureselectionandgraphicaldisplaysformulticlassproblems
AT boitardsimon sparseplsdiscriminantanalysisbiologicallyrelevantfeatureselectionandgraphicaldisplaysformulticlassproblems
AT bessephilippe sparseplsdiscriminantanalysisbiologicallyrelevantfeatureselectionandgraphicaldisplaysformulticlassproblems