Cargando…

Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification

BACKGROUND: Microarray-based tumor classification is characterized by a very large number of features (genes) and small number of samples. In such cases, statistical techniques cannot determine which genes are correlated to each tumor type. A popular solution is the use of a subset of pre-specified...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhu, Manli, Martinez, Aleix M
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2443146/ https://www.ncbi.nlm.nih.gov/pubmed/18554411 http://dx.doi.org/10.1186/1471-2105-9-280

_version_	1782156801636040704
author	Zhu, Manli Martinez, Aleix M
author_facet	Zhu, Manli Martinez, Aleix M
author_sort	Zhu, Manli
collection	PubMed
description	BACKGROUND: Microarray-based tumor classification is characterized by a very large number of features (genes) and small number of samples. In such cases, statistical techniques cannot determine which genes are correlated to each tumor type. A popular solution is the use of a subset of pre-specified genes. However, molecular variations are generally correlated to a large number of genes. A gene that is not correlated to some disease may, by combination with other genes, express itself. RESULTS: In this paper, we propose a new classiification strategy that can reduce the effect of over-fitting without the need to pre-select a small subset of genes. Our solution works by taking advantage of the information embedded in the testing samples. We note that a well-defined classification algorithm works best when the data is properly labeled. Hence, our classification algorithm will discriminate all samples best when the testing sample is assumed to belong to the correct class. We compare our solution with several well-known alternatives for tumor classification on a variety of publicly available data-sets. Our approach consistently leads to better classification results. CONCLUSION: Studies indicate that thousands of samples may be required to extract useful statistical information from microarray data. Herein, it is shown that this problem can be circumvented by using the information embedded in the testing samples.
format	Text
id	pubmed-2443146
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-24431462008-07-07 Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification Zhu, Manli Martinez, Aleix M BMC Bioinformatics Methodology Article BACKGROUND: Microarray-based tumor classification is characterized by a very large number of features (genes) and small number of samples. In such cases, statistical techniques cannot determine which genes are correlated to each tumor type. A popular solution is the use of a subset of pre-specified genes. However, molecular variations are generally correlated to a large number of genes. A gene that is not correlated to some disease may, by combination with other genes, express itself. RESULTS: In this paper, we propose a new classiification strategy that can reduce the effect of over-fitting without the need to pre-select a small subset of genes. Our solution works by taking advantage of the information embedded in the testing samples. We note that a well-defined classification algorithm works best when the data is properly labeled. Hence, our classification algorithm will discriminate all samples best when the testing sample is assumed to belong to the correct class. We compare our solution with several well-known alternatives for tumor classification on a variety of publicly available data-sets. Our approach consistently leads to better classification results. CONCLUSION: Studies indicate that thousands of samples may be required to extract useful statistical information from microarray data. Herein, it is shown that this problem can be circumvented by using the information embedded in the testing samples. BioMed Central 2008-06-14 /pmc/articles/PMC2443146/ /pubmed/18554411 http://dx.doi.org/10.1186/1471-2105-9-280 Text en Copyright © 2008 Zhu and Martinez; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Zhu, Manli Martinez, Aleix M Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification
title	Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification
title_full	Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification
title_fullStr	Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification
title_full_unstemmed	Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification
title_short	Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification
title_sort	using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2443146/ https://www.ncbi.nlm.nih.gov/pubmed/18554411 http://dx.doi.org/10.1186/1471-2105-9-280
work_keys_str_mv	AT zhumanli usingtheinformationembeddedinthetestingsampletobreakthelimitscausedbythesmallsamplesizeinmicroarraybasedclassification AT martinezaleixm usingtheinformationembeddedinthetestingsampletobreakthelimitscausedbythesmallsamplesizeinmicroarraybasedclassification

Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification

Ejemplares similares