Cargando…

Gene function classification using Bayesian models with hierarchy-based priors

BACKGROUND: We investigate whether annotation of gene function can be improved using a classification scheme that is aware that functional classes are organized in a hierarchy. The classifiers look at phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss th...

Descripción completa

Detalles Bibliográficos
Autores principales: Shahbaba, Babak, Neal, Radford M
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1618412/
https://www.ncbi.nlm.nih.gov/pubmed/17038174
http://dx.doi.org/10.1186/1471-2105-7-448
_version_ 1782130521092915200
author Shahbaba, Babak
Neal, Radford M
author_facet Shahbaba, Babak
Neal, Radford M
author_sort Shahbaba, Babak
collection PubMed
description BACKGROUND: We investigate whether annotation of gene function can be improved using a classification scheme that is aware that functional classes are organized in a hierarchy. The classifiers look at phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL) model, a hierarchical model based on a set of nested MNL models, and an MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs) from the E. coli genome. RESULTS: The results from all three models show substantial improvement over previous methods, which were based on the C5 decision tree algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining the three sources of information in this dataset, our new approach to combining data sources produces a higher accuracy rate than applying our models to each data source alone. CONCLUSION: Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information.
format Text
id pubmed-1618412
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-16184122006-10-20 Gene function classification using Bayesian models with hierarchy-based priors Shahbaba, Babak Neal, Radford M BMC Bioinformatics Research Article BACKGROUND: We investigate whether annotation of gene function can be improved using a classification scheme that is aware that functional classes are organized in a hierarchy. The classifiers look at phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL) model, a hierarchical model based on a set of nested MNL models, and an MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs) from the E. coli genome. RESULTS: The results from all three models show substantial improvement over previous methods, which were based on the C5 decision tree algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining the three sources of information in this dataset, our new approach to combining data sources produces a higher accuracy rate than applying our models to each data source alone. CONCLUSION: Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information. BioMed Central 2006-10-12 /pmc/articles/PMC1618412/ /pubmed/17038174 http://dx.doi.org/10.1186/1471-2105-7-448 Text en Copyright © 2006 Shahbaba and Neal; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Shahbaba, Babak
Neal, Radford M
Gene function classification using Bayesian models with hierarchy-based priors
title Gene function classification using Bayesian models with hierarchy-based priors
title_full Gene function classification using Bayesian models with hierarchy-based priors
title_fullStr Gene function classification using Bayesian models with hierarchy-based priors
title_full_unstemmed Gene function classification using Bayesian models with hierarchy-based priors
title_short Gene function classification using Bayesian models with hierarchy-based priors
title_sort gene function classification using bayesian models with hierarchy-based priors
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1618412/
https://www.ncbi.nlm.nih.gov/pubmed/17038174
http://dx.doi.org/10.1186/1471-2105-7-448
work_keys_str_mv AT shahbabababak genefunctionclassificationusingbayesianmodelswithhierarchybasedpriors
AT nealradfordm genefunctionclassificationusingbayesianmodelswithhierarchybasedpriors