Cargando…
A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information
BACKGROUND: The use of gene expression profiling for the classification of human cancer tumors has been widely investigated. Previous studies were successful in distinguishing several tumor types in binary problems. As there are over a hundred types of cancers, and potentially even more subtypes, it...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2876124/ https://www.ncbi.nlm.nih.gov/pubmed/20429942 http://dx.doi.org/10.1186/1471-2164-11-273 |
_version_ | 1782181666852175872 |
---|---|
author | Zhang, Wensheng Robbins, Kelly Wang, Yupeng Bertrand, Keith Rekaya, Romdhane |
author_facet | Zhang, Wensheng Robbins, Kelly Wang, Yupeng Bertrand, Keith Rekaya, Romdhane |
author_sort | Zhang, Wensheng |
collection | PubMed |
description | BACKGROUND: The use of gene expression profiling for the classification of human cancer tumors has been widely investigated. Previous studies were successful in distinguishing several tumor types in binary problems. As there are over a hundred types of cancers, and potentially even more subtypes, it is essential to develop multi-category methodologies for molecular classification for any meaningful practical application. RESULTS: A jackknife-based supervised learning method called paired-samples test algorithm (PST), coupled with a binary classification model based on linear regression, was proposed and applied to two well known and challenging datasets consisting of 14 (GCM dataset) and 9 (NC160 dataset) tumor types. The results showed that the proposed method improved the prediction accuracy of the test samples for the GCM dataset, especially when t-statistic was used in the primary feature selection. For the NCI60 dataset, the application of PST improved prediction accuracy when the numbers of used genes were relatively small (100 or 200). These improvements made the binary classification method more robust to the gene selection mechanism and the size of genes to be used. The overall prediction accuracies were competitive in comparison to the most accurate results obtained by several previous studies on the same datasets and with other methods. Furthermore, the relative confidence R(T) provided a unique insight into the sources of the uncertainty shown in the statistical classification and the potential variants within the same tumor type. CONCLUSION: We proposed a novel bagging method for the classification and uncertainty assessment of multi-category tumor samples using gene expression information. The strengths were demonstrated in the application to two bench datasets. |
format | Text |
id | pubmed-2876124 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-28761242010-05-26 A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information Zhang, Wensheng Robbins, Kelly Wang, Yupeng Bertrand, Keith Rekaya, Romdhane BMC Genomics Methodology Article BACKGROUND: The use of gene expression profiling for the classification of human cancer tumors has been widely investigated. Previous studies were successful in distinguishing several tumor types in binary problems. As there are over a hundred types of cancers, and potentially even more subtypes, it is essential to develop multi-category methodologies for molecular classification for any meaningful practical application. RESULTS: A jackknife-based supervised learning method called paired-samples test algorithm (PST), coupled with a binary classification model based on linear regression, was proposed and applied to two well known and challenging datasets consisting of 14 (GCM dataset) and 9 (NC160 dataset) tumor types. The results showed that the proposed method improved the prediction accuracy of the test samples for the GCM dataset, especially when t-statistic was used in the primary feature selection. For the NCI60 dataset, the application of PST improved prediction accuracy when the numbers of used genes were relatively small (100 or 200). These improvements made the binary classification method more robust to the gene selection mechanism and the size of genes to be used. The overall prediction accuracies were competitive in comparison to the most accurate results obtained by several previous studies on the same datasets and with other methods. Furthermore, the relative confidence R(T) provided a unique insight into the sources of the uncertainty shown in the statistical classification and the potential variants within the same tumor type. CONCLUSION: We proposed a novel bagging method for the classification and uncertainty assessment of multi-category tumor samples using gene expression information. The strengths were demonstrated in the application to two bench datasets. BioMed Central 2010-04-29 /pmc/articles/PMC2876124/ /pubmed/20429942 http://dx.doi.org/10.1186/1471-2164-11-273 Text en Copyright ©2010 Zhang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Zhang, Wensheng Robbins, Kelly Wang, Yupeng Bertrand, Keith Rekaya, Romdhane A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information |
title | A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information |
title_full | A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information |
title_fullStr | A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information |
title_full_unstemmed | A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information |
title_short | A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information |
title_sort | jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2876124/ https://www.ncbi.nlm.nih.gov/pubmed/20429942 http://dx.doi.org/10.1186/1471-2164-11-273 |
work_keys_str_mv | AT zhangwensheng ajackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation AT robbinskelly ajackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation AT wangyupeng ajackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation AT bertrandkeith ajackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation AT rekayaromdhane ajackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation AT zhangwensheng jackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation AT robbinskelly jackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation AT wangyupeng jackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation AT bertrandkeith jackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation AT rekayaromdhane jackknifelikemethodforclassificationanduncertaintyassessmentofmulticategorytumorsamplesusinggeneexpressioninformation |