Cargando…

New feature subset selection procedures for classification of expression profiles

BACKGROUND: Methods for extracting useful information from the datasets produced by microarray experiments are at present of much interest. Here we present new methods for finding gene sets that are well suited for distinguishing experiment classes, such as healthy versus diseased tissues. Our metho...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bø, Trond Hellem, Jonassen, Inge
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2002
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC115205/ https://www.ncbi.nlm.nih.gov/pubmed/11983058

_version_	1782120247301505024
author	Bø, Trond Hellem Jonassen, Inge
author_facet	Bø, Trond Hellem Jonassen, Inge
author_sort	Bø, Trond Hellem
collection	PubMed
description	BACKGROUND: Methods for extracting useful information from the datasets produced by microarray experiments are at present of much interest. Here we present new methods for finding gene sets that are well suited for distinguishing experiment classes, such as healthy versus diseased tissues. Our methods are based on evaluating genes in pairs and evaluating how well a pair in combination distinguishes two experiment classes. We tested the ability of our pair-based methods to select gene sets that generalize the differences between experiment classes and compared the performance relative to two standard methods. To assess the ability to generalize class differences, we studied how well the gene sets we select are suited for learning a classifier. RESULTS: We show that the gene sets selected by our methods outperform the standard methods, in some cases by a large margin, in terms of cross-validation prediction accuracy of the learned classifier. We show that on two public datasets, accurate diagnoses can be made using only 15-30 genes. Our results have implications for how to select marker genes and how many gene measurements are needed for diagnostic purposes. CONCLUSION: When looking for differential expression between experiment classes, it may not be sufficient to look at each gene in a separate universe. Evaluating combinations of genes reveals interesting information that will not be discovered otherwise. Our results show that class prediction can be improved by taking advantage of this extra information.
format	Text
id	pubmed-115205
institution	National Center for Biotechnology Information
language	English
publishDate	2002
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-1152052002-06-07 New feature subset selection procedures for classification of expression profiles Bø, Trond Hellem Jonassen, Inge Genome Biol Research BACKGROUND: Methods for extracting useful information from the datasets produced by microarray experiments are at present of much interest. Here we present new methods for finding gene sets that are well suited for distinguishing experiment classes, such as healthy versus diseased tissues. Our methods are based on evaluating genes in pairs and evaluating how well a pair in combination distinguishes two experiment classes. We tested the ability of our pair-based methods to select gene sets that generalize the differences between experiment classes and compared the performance relative to two standard methods. To assess the ability to generalize class differences, we studied how well the gene sets we select are suited for learning a classifier. RESULTS: We show that the gene sets selected by our methods outperform the standard methods, in some cases by a large margin, in terms of cross-validation prediction accuracy of the learned classifier. We show that on two public datasets, accurate diagnoses can be made using only 15-30 genes. Our results have implications for how to select marker genes and how many gene measurements are needed for diagnostic purposes. CONCLUSION: When looking for differential expression between experiment classes, it may not be sufficient to look at each gene in a separate universe. Evaluating combinations of genes reveals interesting information that will not be discovered otherwise. Our results show that class prediction can be improved by taking advantage of this extra information. BioMed Central 2002 2002-03-14 /pmc/articles/PMC115205/ /pubmed/11983058 Text en Copyright © 2002 Bø and Jonassen, licensee BioMed Central Ltd
spellingShingle	Research Bø, Trond Hellem Jonassen, Inge New feature subset selection procedures for classification of expression profiles
title	New feature subset selection procedures for classification of expression profiles
title_full	New feature subset selection procedures for classification of expression profiles
title_fullStr	New feature subset selection procedures for classification of expression profiles
title_full_unstemmed	New feature subset selection procedures for classification of expression profiles
title_short	New feature subset selection procedures for classification of expression profiles
title_sort	new feature subset selection procedures for classification of expression profiles
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC115205/ https://www.ncbi.nlm.nih.gov/pubmed/11983058
work_keys_str_mv	AT bøtrondhellem newfeaturesubsetselectionproceduresforclassificationofexpressionprofiles AT jonasseninge newfeaturesubsetselectionproceduresforclassificationofexpressionprofiles

New feature subset selection procedures for classification of expression profiles

Ejemplares similares