Cargando…
Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems
BACKGROUND: Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform var...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3133555/ https://www.ncbi.nlm.nih.gov/pubmed/21693065 http://dx.doi.org/10.1186/1471-2105-12-253 |
_version_ | 1782207913777954816 |
---|---|
author | Lê Cao, Kim-Anh Boitard, Simon Besse, Philippe |
author_facet | Lê Cao, Kim-Anh Boitard, Simon Besse, Philippe |
author_sort | Lê Cao, Kim-Anh |
collection | PubMed |
description | BACKGROUND: Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. RESULTS: A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. CONCLUSIONS: sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets. |
format | Online Article Text |
id | pubmed-3133555 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31335552011-07-12 Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems Lê Cao, Kim-Anh Boitard, Simon Besse, Philippe BMC Bioinformatics Research Article BACKGROUND: Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. RESULTS: A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. CONCLUSIONS: sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets. BioMed Central 2011-06-22 /pmc/articles/PMC3133555/ /pubmed/21693065 http://dx.doi.org/10.1186/1471-2105-12-253 Text en Copyright ©2011 Lê Cao et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Lê Cao, Kim-Anh Boitard, Simon Besse, Philippe Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems |
title | Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems |
title_full | Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems |
title_fullStr | Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems |
title_full_unstemmed | Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems |
title_short | Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems |
title_sort | sparse pls discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3133555/ https://www.ncbi.nlm.nih.gov/pubmed/21693065 http://dx.doi.org/10.1186/1471-2105-12-253 |
work_keys_str_mv | AT lecaokimanh sparseplsdiscriminantanalysisbiologicallyrelevantfeatureselectionandgraphicaldisplaysformulticlassproblems AT boitardsimon sparseplsdiscriminantanalysisbiologicallyrelevantfeatureselectionandgraphicaldisplaysformulticlassproblems AT bessephilippe sparseplsdiscriminantanalysisbiologicallyrelevantfeatureselectionandgraphicaldisplaysformulticlassproblems |