Cargando…

A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information

BACKGROUND: The use of gene expression profiling for the classification of human cancer tumors has been widely investigated. Previous studies were successful in distinguishing several tumor types in binary problems. As there are over a hundred types of cancers, and potentially even more subtypes, it...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Wensheng, Robbins, Kelly, Wang, Yupeng, Bertrand, Keith, Rekaya, Romdhane
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2876124/
https://www.ncbi.nlm.nih.gov/pubmed/20429942
http://dx.doi.org/10.1186/1471-2164-11-273
_version_ 1782181666852175872
author Zhang, Wensheng
Robbins, Kelly
Wang, Yupeng
Bertrand, Keith
Rekaya, Romdhane
author_facet Zhang, Wensheng
Robbins, Kelly
Wang, Yupeng
Bertrand, Keith
Rekaya, Romdhane
author_sort Zhang, Wensheng
collection PubMed
description BACKGROUND: The use of gene expression profiling for the classification of human cancer tumors has been widely investigated. Previous studies were successful in distinguishing several tumor types in binary problems. As there are over a hundred types of cancers, and potentially even more subtypes, it is essential to develop multi-category methodologies for molecular classification for any meaningful practical application. RESULTS: A jackknife-based supervised learning method called paired-samples test algorithm (PST), coupled with a binary classification model based on linear regression, was proposed and applied to two well known and challenging datasets consisting of 14 (GCM dataset) and 9 (NC160 dataset) tumor types. The results showed that the proposed method improved the prediction accuracy of the test samples for the GCM dataset, especially when t-statistic was used in the primary feature selection. For the NCI60 dataset, the application of PST improved prediction accuracy when the numbers of used genes were relatively small (100 or 200). These improvements made the binary classification method more robust to the gene selection mechanism and the size of genes to be used. The overall prediction accuracies were competitive in comparison to the most accurate results obtained by several previous studies on the same datasets and with other methods. Furthermore, the relative confidence R(T) provided a unique insight into the sources of the uncertainty shown in the statistical classification and the potential variants within the same tumor type. CONCLUSION: We proposed a novel bagging method for the classification and uncertainty assessment of multi-category tumor samples using gene expression information. The strengths were demonstrated in the application to two bench datasets.
format Text
id pubmed-2876124
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28761242010-05-26 A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information Zhang, Wensheng Robbins, Kelly Wang, Yupeng Bertrand, Keith Rekaya, Romdhane BMC Genomics Methodology Article BACKGROUND: The use of gene expression profiling for the classification of human cancer tumors has been widely investigated. Previous studies were successful in distinguishing several tumor types in binary problems. As there are over a hundred types of cancers, and potentially even more subtypes, it is essential to develop multi-category methodologies for molecular classification for any meaningful practical application. RESULTS: A jackknife-based supervised learning method called paired-samples test algorithm (PST), coupled with a binary classification model based on linear regression, was proposed and applied to two well known and challenging datasets consisting of 14 (GCM dataset) and 9 (NC160 dataset) tumor types. The results showed that the proposed method improved the prediction accuracy of the test samples for the GCM dataset, especially when t-statistic was used in the primary feature selection. For the NCI60 dataset, the application of PST improved prediction accuracy when the numbers of used genes were relatively small (100 or 200). These improvements made the binary classification method more robust to the gene selection mechanism and the size of genes to be used. The overall prediction accuracies were competitive in comparison to the most accurate results obtained by several previous studies on the same datasets and with other methods. Furthermore, the relative confidence R(T) provided a unique insight into the sources of the uncertainty shown in the statistical classification and the potential variants within the same tumor type. CONCLUSION: We proposed a novel bagging method for the classification and uncertainty assessment of multi-category tumor samples using gene expression information. The strengths were demonstrated in the application to two bench datasets. BioMed Central 2010-04-29 /pmc/articles/PMC2876124/ /pubmed/20429942 http://dx.doi.org/10.1186/1471-2164-11-273 Text en Copyright ©2010 Zhang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Zhang, Wensheng
Robbins, Kelly
Wang, Yupeng
Bertrand, Keith
Rekaya, Romdhane
A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information
title A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information
title_full A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information
title_fullStr A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information
title_full_unstemmed A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information
title_short A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information
title_sort jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2876124/
https://www.ncbi.nlm.nih.gov/pubmed/20429942
http://dx.doi.org/10.1186/1471-2164-11-273
work_keys_str_mv AT zhangwensheng ajackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation
AT robbinskelly ajackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation
AT wangyupeng ajackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation
AT bertrandkeith ajackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation
AT rekayaromdhane ajackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation
AT zhangwensheng jackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation
AT robbinskelly jackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation
AT wangyupeng jackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation
AT bertrandkeith jackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation
AT rekayaromdhane jackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation