Cargando…

Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery

BACKGROUND: Molecular profiling generates abundance measurements for thousands of gene transcripts in biological samples such as normal and tumor tissues (data points). Given such two-class high-dimensional data, many methods have been proposed for classifying data points into one of the two classes...

Descripción completa

Detalles Bibliográficos
Autor principal:	Grate, Leslie R
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2005
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090559/ https://www.ncbi.nlm.nih.gov/pubmed/15826317 http://dx.doi.org/10.1186/1471-2105-6-97

_version_	1782123885034995712
author	Grate, Leslie R
author_facet	Grate, Leslie R
author_sort	Grate, Leslie R
collection	PubMed
description	BACKGROUND: Molecular profiling generates abundance measurements for thousands of gene transcripts in biological samples such as normal and tumor tissues (data points). Given such two-class high-dimensional data, many methods have been proposed for classifying data points into one of the two classes. However, finding very small sets of features able to correctly classify the data is problematic as the fundamental mathematical proposition is hard. Existing methods can find "small" feature sets, but give no hint how close this is to the true minimum size. Without fundamental mathematical advances, finding true minimum-size sets will remain elusive, and more importantly for the microarray community there will be no methods for finding them. RESULTS: We use the brute force approach of exhaustive search through all genes, gene pairs (and for some data sets gene triples). Each unique gene combination is analyzed with a few-parameter linear-hyperplane classification method looking for those combinations that form training error-free classifiers. All 10 published data sets studied are found to contain predictive small feature sets. Four contain thousands of gene pairs and 6 have single genes that perfectly discriminate. CONCLUSION: This technique discovered small sets of genes (3 or less) in published data that form accurate classifiers, yet were not reported in the prior publications. This could be a common characteristic of microarray data, thus making looking for them worth the computational cost. Such small gene sets could indicate biomarkers and portend simple medical diagnostic tests. We recommend checking for small gene sets routinely. We find 4 gene pairs and many gene triples in the large hepatocellular carcinoma (HCC, Liver cancer) data set of Chen et al. The key component of these is the "placental gene of unknown function", PLAC8. Our HMM modeling indicates PLAC8 might have a domain like part of lP59's crystal structure (a Non-Covalent Endonuclease lii-Dna Complex). The previously identified HCC biomarker gene, glypican 3 (GPC3), is part of an accurate gene triple involving MT1E and ARHE. We also find small gene sets that distinguish leukemia subtypes in the large pediatric acute lymphoblastic leukemia cancer set of Yeoh et al.
format	Text
id	pubmed-1090559
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-10905592005-05-07 Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery Grate, Leslie R BMC Bioinformatics Research Article BACKGROUND: Molecular profiling generates abundance measurements for thousands of gene transcripts in biological samples such as normal and tumor tissues (data points). Given such two-class high-dimensional data, many methods have been proposed for classifying data points into one of the two classes. However, finding very small sets of features able to correctly classify the data is problematic as the fundamental mathematical proposition is hard. Existing methods can find "small" feature sets, but give no hint how close this is to the true minimum size. Without fundamental mathematical advances, finding true minimum-size sets will remain elusive, and more importantly for the microarray community there will be no methods for finding them. RESULTS: We use the brute force approach of exhaustive search through all genes, gene pairs (and for some data sets gene triples). Each unique gene combination is analyzed with a few-parameter linear-hyperplane classification method looking for those combinations that form training error-free classifiers. All 10 published data sets studied are found to contain predictive small feature sets. Four contain thousands of gene pairs and 6 have single genes that perfectly discriminate. CONCLUSION: This technique discovered small sets of genes (3 or less) in published data that form accurate classifiers, yet were not reported in the prior publications. This could be a common characteristic of microarray data, thus making looking for them worth the computational cost. Such small gene sets could indicate biomarkers and portend simple medical diagnostic tests. We recommend checking for small gene sets routinely. We find 4 gene pairs and many gene triples in the large hepatocellular carcinoma (HCC, Liver cancer) data set of Chen et al. The key component of these is the "placental gene of unknown function", PLAC8. Our HMM modeling indicates PLAC8 might have a domain like part of lP59's crystal structure (a Non-Covalent Endonuclease lii-Dna Complex). The previously identified HCC biomarker gene, glypican 3 (GPC3), is part of an accurate gene triple involving MT1E and ARHE. We also find small gene sets that distinguish leukemia subtypes in the large pediatric acute lymphoblastic leukemia cancer set of Yeoh et al. BioMed Central 2005-04-13 /pmc/articles/PMC1090559/ /pubmed/15826317 http://dx.doi.org/10.1186/1471-2105-6-97 Text en Copyright © 2005 Grate; licensee BioMed Central Ltd.
spellingShingle	Research Article Grate, Leslie R Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery
title	Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery
title_full	Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery
title_fullStr	Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery
title_full_unstemmed	Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery
title_short	Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery
title_sort	many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090559/ https://www.ncbi.nlm.nih.gov/pubmed/15826317 http://dx.doi.org/10.1186/1471-2105-6-97
work_keys_str_mv	AT grateleslier manyaccuratesmalldiscriminatoryfeaturesubsetsexistinmicroarraytranscriptdatabiomarkerdiscovery

Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery

Ejemplares similares