Cargando…

Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy

Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial le...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Chi-Cheng, Tu, Shih-Hsin, Huang, Ching-Shui, Lien, Heng-Hui, Lai, Liang-Chuan, Chuang, Eric Y.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3893734/
https://www.ncbi.nlm.nih.gov/pubmed/24490149
http://dx.doi.org/10.1155/2013/248648
_version_ 1782299742459396096
author Huang, Chi-Cheng
Tu, Shih-Hsin
Huang, Ching-Shui
Lien, Heng-Hui
Lai, Liang-Chuan
Chuang, Eric Y.
author_facet Huang, Chi-Cheng
Tu, Shih-Hsin
Huang, Ching-Shui
Lien, Heng-Hui
Lai, Liang-Chuan
Chuang, Eric Y.
author_sort Huang, Chi-Cheng
collection PubMed
description Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS) regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n = 535). The agreement between PAM50 centroid-based single sample prediction (SSP) and PLS-regression was excellent (weighted Kappa: 0.988) within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed). Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.
format Online
Article
Text
id pubmed-3893734
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-38937342014-02-02 Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy Huang, Chi-Cheng Tu, Shih-Hsin Huang, Ching-Shui Lien, Heng-Hui Lai, Liang-Chuan Chuang, Eric Y. Biomed Res Int Research Article Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS) regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n = 535). The agreement between PAM50 centroid-based single sample prediction (SSP) and PLS-regression was excellent (weighted Kappa: 0.988) within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed). Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes. Hindawi Publishing Corporation 2013 2013-12-30 /pmc/articles/PMC3893734/ /pubmed/24490149 http://dx.doi.org/10.1155/2013/248648 Text en Copyright © 2013 Chi-Cheng Huang et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Huang, Chi-Cheng
Tu, Shih-Hsin
Huang, Ching-Shui
Lien, Heng-Hui
Lai, Liang-Chuan
Chuang, Eric Y.
Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy
title Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy
title_full Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy
title_fullStr Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy
title_full_unstemmed Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy
title_short Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy
title_sort multiclass prediction with partial least square regression for gene expression data: applications in breast cancer intrinsic taxonomy
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3893734/
https://www.ncbi.nlm.nih.gov/pubmed/24490149
http://dx.doi.org/10.1155/2013/248648
work_keys_str_mv AT huangchicheng multiclasspredictionwithpartialleastsquareregressionforgeneexpressiondataapplicationsinbreastcancerintrinsictaxonomy
AT tushihhsin multiclasspredictionwithpartialleastsquareregressionforgeneexpressiondataapplicationsinbreastcancerintrinsictaxonomy
AT huangchingshui multiclasspredictionwithpartialleastsquareregressionforgeneexpressiondataapplicationsinbreastcancerintrinsictaxonomy
AT lienhenghui multiclasspredictionwithpartialleastsquareregressionforgeneexpressiondataapplicationsinbreastcancerintrinsictaxonomy
AT lailiangchuan multiclasspredictionwithpartialleastsquareregressionforgeneexpressiondataapplicationsinbreastcancerintrinsictaxonomy
AT chuangericy multiclasspredictionwithpartialleastsquareregressionforgeneexpressiondataapplicationsinbreastcancerintrinsictaxonomy