Cargando…

A domain-centric solution to functional genomics via dcGO Predictor

BACKGROUND: Computational/manual annotations of protein functions are one of the first routes to making sense of a newly sequenced genome. Protein domain predictions form an essential part of this annotation process. This is due to the natural modularity of proteins with domains as structural, evolu...

Descripción completa

Detalles Bibliográficos
Autores principales: Fang, Hai, Gough, Julian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3584936/
https://www.ncbi.nlm.nih.gov/pubmed/23514627
http://dx.doi.org/10.1186/1471-2105-14-S3-S9
_version_ 1782261081269338112
author Fang, Hai
Gough, Julian
author_facet Fang, Hai
Gough, Julian
author_sort Fang, Hai
collection PubMed
description BACKGROUND: Computational/manual annotations of protein functions are one of the first routes to making sense of a newly sequenced genome. Protein domain predictions form an essential part of this annotation process. This is due to the natural modularity of proteins with domains as structural, evolutionary and functional units. Sometimes two, three, or more adjacent domains (called supra-domains) are the operational unit responsible for a function, e.g. via a binding site at the interface. These supra-domains have contributed to functional diversification in higher organisms. Traditionally functional ontologies have been applied to individual proteins, rather than families of related domains and supra-domains. We expect, however, to some extent functional signals can be carried by protein domains and supra-domains, and consequently used in function prediction and functional genomics. RESULTS: Here we present a domain-centric Gene Ontology (dcGO) perspective. We generalize a framework for automatically inferring ontological terms associated with domains and supra-domains from full-length sequence annotations. This general framework has been applied specifically to primary protein-level annotations from UniProtKB-GOA, generating GO term associations with SCOP domains and supra-domains. The resulting 'dcGO Predictor', can be used to provide functional annotation to protein sequences. The functional annotation of sequences in the Critical Assessment of Function Annotation (CAFA) has been used as a valuable opportunity to validate our method and to be assessed by the community. The functional annotation of all completely sequenced genomes has demonstrated the potential for domain-centric GO enrichment analysis to yield functional insights into newly sequenced or yet-to-be-annotated genomes. This generalized framework we have presented has also been applied to other domain classifications such as InterPro and Pfam, and other ontologies such as mammalian phenotype and disease ontology. The dcGO and its predictor are available at http://supfam.org/SUPERFAMILY/dcGO including an enrichment analysis tool. CONCLUSIONS: As functional units, domains offer a unique perspective on function prediction regardless of whether proteins are multi-domain or single-domain. The 'dcGO Predictor' holds great promise for contributing to a domain-centric functional understanding of genomes in the next generation sequencing era.
format Online
Article
Text
id pubmed-3584936
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35849362013-03-11 A domain-centric solution to functional genomics via dcGO Predictor Fang, Hai Gough, Julian BMC Bioinformatics Proceedings BACKGROUND: Computational/manual annotations of protein functions are one of the first routes to making sense of a newly sequenced genome. Protein domain predictions form an essential part of this annotation process. This is due to the natural modularity of proteins with domains as structural, evolutionary and functional units. Sometimes two, three, or more adjacent domains (called supra-domains) are the operational unit responsible for a function, e.g. via a binding site at the interface. These supra-domains have contributed to functional diversification in higher organisms. Traditionally functional ontologies have been applied to individual proteins, rather than families of related domains and supra-domains. We expect, however, to some extent functional signals can be carried by protein domains and supra-domains, and consequently used in function prediction and functional genomics. RESULTS: Here we present a domain-centric Gene Ontology (dcGO) perspective. We generalize a framework for automatically inferring ontological terms associated with domains and supra-domains from full-length sequence annotations. This general framework has been applied specifically to primary protein-level annotations from UniProtKB-GOA, generating GO term associations with SCOP domains and supra-domains. The resulting 'dcGO Predictor', can be used to provide functional annotation to protein sequences. The functional annotation of sequences in the Critical Assessment of Function Annotation (CAFA) has been used as a valuable opportunity to validate our method and to be assessed by the community. The functional annotation of all completely sequenced genomes has demonstrated the potential for domain-centric GO enrichment analysis to yield functional insights into newly sequenced or yet-to-be-annotated genomes. This generalized framework we have presented has also been applied to other domain classifications such as InterPro and Pfam, and other ontologies such as mammalian phenotype and disease ontology. The dcGO and its predictor are available at http://supfam.org/SUPERFAMILY/dcGO including an enrichment analysis tool. CONCLUSIONS: As functional units, domains offer a unique perspective on function prediction regardless of whether proteins are multi-domain or single-domain. The 'dcGO Predictor' holds great promise for contributing to a domain-centric functional understanding of genomes in the next generation sequencing era. BioMed Central 2013-02-28 /pmc/articles/PMC3584936/ /pubmed/23514627 http://dx.doi.org/10.1186/1471-2105-14-S3-S9 Text en Copyright ©2013 Fang and Gough; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Fang, Hai
Gough, Julian
A domain-centric solution to functional genomics via dcGO Predictor
title A domain-centric solution to functional genomics via dcGO Predictor
title_full A domain-centric solution to functional genomics via dcGO Predictor
title_fullStr A domain-centric solution to functional genomics via dcGO Predictor
title_full_unstemmed A domain-centric solution to functional genomics via dcGO Predictor
title_short A domain-centric solution to functional genomics via dcGO Predictor
title_sort domain-centric solution to functional genomics via dcgo predictor
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3584936/
https://www.ncbi.nlm.nih.gov/pubmed/23514627
http://dx.doi.org/10.1186/1471-2105-14-S3-S9
work_keys_str_mv AT fanghai adomaincentricsolutiontofunctionalgenomicsviadcgopredictor
AT goughjulian adomaincentricsolutiontofunctionalgenomicsviadcgopredictor
AT fanghai domaincentricsolutiontofunctionalgenomicsviadcgopredictor
AT goughjulian domaincentricsolutiontofunctionalgenomicsviadcgopredictor