Cargando…

Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data

MOTIVATION: DNA microarray analysis is characterized by obtaining a large number of gene variables from a small number of observations. Cluster analysis is widely used to analyze DNA microarray data to make classification and diagnosis of disease. Because there are so many irrelevant and insignifica...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Mingxing, Li, Xiumin, Li, Zhibin, Ou, Zhimin, Liu, Ming, Liu, Suhuan, Li, Xuejun, Yang, Shuyu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3875537/
https://www.ncbi.nlm.nih.gov/pubmed/24386356
http://dx.doi.org/10.1371/journal.pone.0084253
_version_ 1782297372269740032
author Yang, Mingxing
Li, Xiumin
Li, Zhibin
Ou, Zhimin
Liu, Ming
Liu, Suhuan
Li, Xuejun
Yang, Shuyu
author_facet Yang, Mingxing
Li, Xiumin
Li, Zhibin
Ou, Zhimin
Liu, Ming
Liu, Suhuan
Li, Xuejun
Yang, Shuyu
author_sort Yang, Mingxing
collection PubMed
description MOTIVATION: DNA microarray analysis is characterized by obtaining a large number of gene variables from a small number of observations. Cluster analysis is widely used to analyze DNA microarray data to make classification and diagnosis of disease. Because there are so many irrelevant and insignificant genes in a dataset, a feature selection approach must be employed in data analysis. The performance of cluster analysis of this high-throughput data depends on whether the feature selection approach chooses the most relevant genes associated with disease classes. RESULTS: Here we proposed a new method using multiple Orthogonal Partial Least Squares-Discriminant Analysis (mOPLS-DA) models and S-plots to select the most relevant genes to conduct three-class disease classification and prediction. We tested our method using Golub’s leukemia microarray data. For three classes with subtypes, we proposed hierarchical orthogonal partial least squares-discriminant analysis (OPLS-DA) models and S-plots to select features for two main classes and their subtypes. For three classes in parallel, we employed three OPLS-DA models and S-plots to choose marker genes for each class. The power of feature selection to classify and predict three-class disease was evaluated using cluster analysis. Further, the general performance of our method was tested using four public datasets and compared with those of four other feature selection methods. The results revealed that our method effectively selected the most relevant features for disease classification and prediction, and its performance was better than that of the other methods.
format Online
Article
Text
id pubmed-3875537
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38755372014-01-02 Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data Yang, Mingxing Li, Xiumin Li, Zhibin Ou, Zhimin Liu, Ming Liu, Suhuan Li, Xuejun Yang, Shuyu PLoS One Research Article MOTIVATION: DNA microarray analysis is characterized by obtaining a large number of gene variables from a small number of observations. Cluster analysis is widely used to analyze DNA microarray data to make classification and diagnosis of disease. Because there are so many irrelevant and insignificant genes in a dataset, a feature selection approach must be employed in data analysis. The performance of cluster analysis of this high-throughput data depends on whether the feature selection approach chooses the most relevant genes associated with disease classes. RESULTS: Here we proposed a new method using multiple Orthogonal Partial Least Squares-Discriminant Analysis (mOPLS-DA) models and S-plots to select the most relevant genes to conduct three-class disease classification and prediction. We tested our method using Golub’s leukemia microarray data. For three classes with subtypes, we proposed hierarchical orthogonal partial least squares-discriminant analysis (OPLS-DA) models and S-plots to select features for two main classes and their subtypes. For three classes in parallel, we employed three OPLS-DA models and S-plots to choose marker genes for each class. The power of feature selection to classify and predict three-class disease was evaluated using cluster analysis. Further, the general performance of our method was tested using four public datasets and compared with those of four other feature selection methods. The results revealed that our method effectively selected the most relevant features for disease classification and prediction, and its performance was better than that of the other methods. Public Library of Science 2013-12-30 /pmc/articles/PMC3875537/ /pubmed/24386356 http://dx.doi.org/10.1371/journal.pone.0084253 Text en © 2013 Yang et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Yang, Mingxing
Li, Xiumin
Li, Zhibin
Ou, Zhimin
Liu, Ming
Liu, Suhuan
Li, Xuejun
Yang, Shuyu
Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data
title Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data
title_full Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data
title_fullStr Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data
title_full_unstemmed Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data
title_short Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data
title_sort gene features selection for three-class disease classification via multiple orthogonal partial least square discriminant analysis and s-plot using microarray data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3875537/
https://www.ncbi.nlm.nih.gov/pubmed/24386356
http://dx.doi.org/10.1371/journal.pone.0084253
work_keys_str_mv AT yangmingxing genefeaturesselectionforthreeclassdiseaseclassificationviamultipleorthogonalpartialleastsquarediscriminantanalysisandsplotusingmicroarraydata
AT lixiumin genefeaturesselectionforthreeclassdiseaseclassificationviamultipleorthogonalpartialleastsquarediscriminantanalysisandsplotusingmicroarraydata
AT lizhibin genefeaturesselectionforthreeclassdiseaseclassificationviamultipleorthogonalpartialleastsquarediscriminantanalysisandsplotusingmicroarraydata
AT ouzhimin genefeaturesselectionforthreeclassdiseaseclassificationviamultipleorthogonalpartialleastsquarediscriminantanalysisandsplotusingmicroarraydata
AT liuming genefeaturesselectionforthreeclassdiseaseclassificationviamultipleorthogonalpartialleastsquarediscriminantanalysisandsplotusingmicroarraydata
AT liusuhuan genefeaturesselectionforthreeclassdiseaseclassificationviamultipleorthogonalpartialleastsquarediscriminantanalysisandsplotusingmicroarraydata
AT lixuejun genefeaturesselectionforthreeclassdiseaseclassificationviamultipleorthogonalpartialleastsquarediscriminantanalysisandsplotusingmicroarraydata
AT yangshuyu genefeaturesselectionforthreeclassdiseaseclassificationviamultipleorthogonalpartialleastsquarediscriminantanalysisandsplotusingmicroarraydata