Cargando…

Identifying genes that contribute most to good classification in microarrays

BACKGROUND: The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene interactions; the disadvantage of the latter is that it...

Descripción completa

Detalles Bibliográficos
Autores principales:	Baker, Stuart G, Kramer, Barnett S
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1574352/ https://www.ncbi.nlm.nih.gov/pubmed/16959042 http://dx.doi.org/10.1186/1471-2105-7-407

_version_	1782130293061189632
author	Baker, Stuart G Kramer, Barnett S
author_facet	Baker, Stuart G Kramer, Barnett S
author_sort	Baker, Stuart G
collection	PubMed
description	BACKGROUND: The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene interactions; the disadvantage of the latter is that it often does not provide a sufficient focus for further investigation because many genes may be included by chance. Our strategy is to search for classification rules that perform well with few genes and, if they are found, identify genes that occur relatively frequently under multiple random validation (random splits into training and test samples). RESULTS: We analyzed data from four published studies related to cancer. For classification we used a filter with a nearest centroid rule that is easy to implement and has been previously shown to perform well. To comprehensively measure classification performance we used receiver operating characteristic curves. In the three data sets with good classification performance, the classification rules for 5 genes were only slightly worse than for 20 or 50 genes and somewhat better than for 1 gene. In two of these data sets, one or two genes had relatively high frequencies not noticeable with rules involving 20 or 50 genes: desmin for classifying colon cancer versus normal tissue; and zyxin and secretory granule proteoglycan genes for classifying two types of leukemia. CONCLUSION: Using multiple random validation, investigators should look for classification rules that perform well with few genes and select, for further study, genes with relatively high frequencies of occurrence in these classification rules.
format	Text
id	pubmed-1574352
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-15743522006-09-23 Identifying genes that contribute most to good classification in microarrays Baker, Stuart G Kramer, Barnett S BMC Bioinformatics Research Article BACKGROUND: The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene interactions; the disadvantage of the latter is that it often does not provide a sufficient focus for further investigation because many genes may be included by chance. Our strategy is to search for classification rules that perform well with few genes and, if they are found, identify genes that occur relatively frequently under multiple random validation (random splits into training and test samples). RESULTS: We analyzed data from four published studies related to cancer. For classification we used a filter with a nearest centroid rule that is easy to implement and has been previously shown to perform well. To comprehensively measure classification performance we used receiver operating characteristic curves. In the three data sets with good classification performance, the classification rules for 5 genes were only slightly worse than for 20 or 50 genes and somewhat better than for 1 gene. In two of these data sets, one or two genes had relatively high frequencies not noticeable with rules involving 20 or 50 genes: desmin for classifying colon cancer versus normal tissue; and zyxin and secretory granule proteoglycan genes for classifying two types of leukemia. CONCLUSION: Using multiple random validation, investigators should look for classification rules that perform well with few genes and select, for further study, genes with relatively high frequencies of occurrence in these classification rules. BioMed Central 2006-09-07 /pmc/articles/PMC1574352/ /pubmed/16959042 http://dx.doi.org/10.1186/1471-2105-7-407 Text en Copyright © 2006 Baker and Kramer; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Baker, Stuart G Kramer, Barnett S Identifying genes that contribute most to good classification in microarrays
title	Identifying genes that contribute most to good classification in microarrays
title_full	Identifying genes that contribute most to good classification in microarrays
title_fullStr	Identifying genes that contribute most to good classification in microarrays
title_full_unstemmed	Identifying genes that contribute most to good classification in microarrays
title_short	Identifying genes that contribute most to good classification in microarrays
title_sort	identifying genes that contribute most to good classification in microarrays
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1574352/ https://www.ncbi.nlm.nih.gov/pubmed/16959042 http://dx.doi.org/10.1186/1471-2105-7-407
work_keys_str_mv	AT bakerstuartg identifyinggenesthatcontributemosttogoodclassificationinmicroarrays AT kramerbarnetts identifyinggenesthatcontributemosttogoodclassificationinmicroarrays

Identifying genes that contribute most to good classification in microarrays

Ejemplares similares