Cargando…

Evaluation of gene importance in microarray data based upon probability of selection

BACKGROUND: Microarray devices permit a genome-scale evaluation of gene function. This technology has catalyzed biomedical research and development in recent years. As many important diseases can be traced down to the gene level, a long-standing research problem is to identify specific gene expressi...

Descripción completa

Detalles Bibliográficos
Autores principales: Fu, Li M, Fu-Liu, Casey S
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1274261/
https://www.ncbi.nlm.nih.gov/pubmed/15784140
http://dx.doi.org/10.1186/1471-2105-6-67
_version_ 1782125975634444288
author Fu, Li M
Fu-Liu, Casey S
author_facet Fu, Li M
Fu-Liu, Casey S
author_sort Fu, Li M
collection PubMed
description BACKGROUND: Microarray devices permit a genome-scale evaluation of gene function. This technology has catalyzed biomedical research and development in recent years. As many important diseases can be traced down to the gene level, a long-standing research problem is to identify specific gene expression patterns linking to metabolic characteristics that contribute to disease development and progression. The microarray approach offers an expedited solution to this problem. However, it has posed a challenging issue to recognize disease-related genes expression patterns embedded in the microarray data. In selecting a small set of biologically significant genes for classifier design, the nature of high data dimensionality inherent in this problem creates substantial amount of uncertainty. RESULTS: Here we present a model for probability analysis of selected genes in order to determine their importance. Our contribution is that we show how to derive the P value of each selected gene in multiple gene selection trials based on different combinations of data samples and how to conduct a reliability analysis accordingly. The importance of a gene is indicated by its associated P value in that a smaller value implies higher information content from information theory. On the microarray data concerning the subtype classification of small round blue cell tumors, we demonstrate that the method is capable of finding the smallest set of genes (19 genes) with optimal classification performance, compared with results reported in the literature. CONCLUSION: In classifier design based on microarray data, the probability value derived from gene selection based on multiple combinations of data samples enables an effective mechanism for reducing the tendency of fitting local data particularities.
format Text
id pubmed-1274261
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-12742612005-10-29 Evaluation of gene importance in microarray data based upon probability of selection Fu, Li M Fu-Liu, Casey S BMC Bioinformatics Methodology Article BACKGROUND: Microarray devices permit a genome-scale evaluation of gene function. This technology has catalyzed biomedical research and development in recent years. As many important diseases can be traced down to the gene level, a long-standing research problem is to identify specific gene expression patterns linking to metabolic characteristics that contribute to disease development and progression. The microarray approach offers an expedited solution to this problem. However, it has posed a challenging issue to recognize disease-related genes expression patterns embedded in the microarray data. In selecting a small set of biologically significant genes for classifier design, the nature of high data dimensionality inherent in this problem creates substantial amount of uncertainty. RESULTS: Here we present a model for probability analysis of selected genes in order to determine their importance. Our contribution is that we show how to derive the P value of each selected gene in multiple gene selection trials based on different combinations of data samples and how to conduct a reliability analysis accordingly. The importance of a gene is indicated by its associated P value in that a smaller value implies higher information content from information theory. On the microarray data concerning the subtype classification of small round blue cell tumors, we demonstrate that the method is capable of finding the smallest set of genes (19 genes) with optimal classification performance, compared with results reported in the literature. CONCLUSION: In classifier design based on microarray data, the probability value derived from gene selection based on multiple combinations of data samples enables an effective mechanism for reducing the tendency of fitting local data particularities. BioMed Central 2005-03-22 /pmc/articles/PMC1274261/ /pubmed/15784140 http://dx.doi.org/10.1186/1471-2105-6-67 Text en Copyright © 2005 Fu and Fu-Liu; licensee BioMed Central Ltd.
spellingShingle Methodology Article
Fu, Li M
Fu-Liu, Casey S
Evaluation of gene importance in microarray data based upon probability of selection
title Evaluation of gene importance in microarray data based upon probability of selection
title_full Evaluation of gene importance in microarray data based upon probability of selection
title_fullStr Evaluation of gene importance in microarray data based upon probability of selection
title_full_unstemmed Evaluation of gene importance in microarray data based upon probability of selection
title_short Evaluation of gene importance in microarray data based upon probability of selection
title_sort evaluation of gene importance in microarray data based upon probability of selection
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1274261/
https://www.ncbi.nlm.nih.gov/pubmed/15784140
http://dx.doi.org/10.1186/1471-2105-6-67
work_keys_str_mv AT fulim evaluationofgeneimportanceinmicroarraydatabaseduponprobabilityofselection
AT fuliucaseys evaluationofgeneimportanceinmicroarraydatabaseduponprobabilityofselection