Cargando…

New feature subset selection procedures for classification of expression profiles

BACKGROUND: Methods for extracting useful information from the datasets produced by microarray experiments are at present of much interest. Here we present new methods for finding gene sets that are well suited for distinguishing experiment classes, such as healthy versus diseased tissues. Our metho...

Descripción completa

Detalles Bibliográficos
Autores principales: Bø, Trond Hellem, Jonassen, Inge
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2002
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC115205/
https://www.ncbi.nlm.nih.gov/pubmed/11983058
_version_ 1782120247301505024
author Bø, Trond Hellem
Jonassen, Inge
author_facet Bø, Trond Hellem
Jonassen, Inge
author_sort Bø, Trond Hellem
collection PubMed
description BACKGROUND: Methods for extracting useful information from the datasets produced by microarray experiments are at present of much interest. Here we present new methods for finding gene sets that are well suited for distinguishing experiment classes, such as healthy versus diseased tissues. Our methods are based on evaluating genes in pairs and evaluating how well a pair in combination distinguishes two experiment classes. We tested the ability of our pair-based methods to select gene sets that generalize the differences between experiment classes and compared the performance relative to two standard methods. To assess the ability to generalize class differences, we studied how well the gene sets we select are suited for learning a classifier. RESULTS: We show that the gene sets selected by our methods outperform the standard methods, in some cases by a large margin, in terms of cross-validation prediction accuracy of the learned classifier. We show that on two public datasets, accurate diagnoses can be made using only 15-30 genes. Our results have implications for how to select marker genes and how many gene measurements are needed for diagnostic purposes. CONCLUSION: When looking for differential expression between experiment classes, it may not be sufficient to look at each gene in a separate universe. Evaluating combinations of genes reveals interesting information that will not be discovered otherwise. Our results show that class prediction can be improved by taking advantage of this extra information.
format Text
id pubmed-115205
institution National Center for Biotechnology Information
language English
publishDate 2002
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1152052002-06-07 New feature subset selection procedures for classification of expression profiles Bø, Trond Hellem Jonassen, Inge Genome Biol Research BACKGROUND: Methods for extracting useful information from the datasets produced by microarray experiments are at present of much interest. Here we present new methods for finding gene sets that are well suited for distinguishing experiment classes, such as healthy versus diseased tissues. Our methods are based on evaluating genes in pairs and evaluating how well a pair in combination distinguishes two experiment classes. We tested the ability of our pair-based methods to select gene sets that generalize the differences between experiment classes and compared the performance relative to two standard methods. To assess the ability to generalize class differences, we studied how well the gene sets we select are suited for learning a classifier. RESULTS: We show that the gene sets selected by our methods outperform the standard methods, in some cases by a large margin, in terms of cross-validation prediction accuracy of the learned classifier. We show that on two public datasets, accurate diagnoses can be made using only 15-30 genes. Our results have implications for how to select marker genes and how many gene measurements are needed for diagnostic purposes. CONCLUSION: When looking for differential expression between experiment classes, it may not be sufficient to look at each gene in a separate universe. Evaluating combinations of genes reveals interesting information that will not be discovered otherwise. Our results show that class prediction can be improved by taking advantage of this extra information. BioMed Central 2002 2002-03-14 /pmc/articles/PMC115205/ /pubmed/11983058 Text en Copyright © 2002 Bø and Jonassen, licensee BioMed Central Ltd
spellingShingle Research
Bø, Trond Hellem
Jonassen, Inge
New feature subset selection procedures for classification of expression profiles
title New feature subset selection procedures for classification of expression profiles
title_full New feature subset selection procedures for classification of expression profiles
title_fullStr New feature subset selection procedures for classification of expression profiles
title_full_unstemmed New feature subset selection procedures for classification of expression profiles
title_short New feature subset selection procedures for classification of expression profiles
title_sort new feature subset selection procedures for classification of expression profiles
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC115205/
https://www.ncbi.nlm.nih.gov/pubmed/11983058
work_keys_str_mv AT bøtrondhellem newfeaturesubsetselectionproceduresforclassificationofexpressionprofiles
AT jonasseninge newfeaturesubsetselectionproceduresforclassificationofexpressionprofiles