Cargando…

DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe

BACKGROUND: Computational predictions of catalytic function are vital for in-depth understanding of enzymes. Because several novel approaches performing better than the common BLAST tool are rarely applied in research, we hypothesized that there is a large gap between the number of known annotated e...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Tianmin, Mori, Hiroshi, Zhang, Chong, Kurokawa, Ken, Xing, Xin-Hui, Yamada, Takuji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4389672/
https://www.ncbi.nlm.nih.gov/pubmed/25888481
http://dx.doi.org/10.1186/s12859-015-0499-y
_version_ 1782365600292536320
author Wang, Tianmin
Mori, Hiroshi
Zhang, Chong
Kurokawa, Ken
Xing, Xin-Hui
Yamada, Takuji
author_facet Wang, Tianmin
Mori, Hiroshi
Zhang, Chong
Kurokawa, Ken
Xing, Xin-Hui
Yamada, Takuji
author_sort Wang, Tianmin
collection PubMed
description BACKGROUND: Computational predictions of catalytic function are vital for in-depth understanding of enzymes. Because several novel approaches performing better than the common BLAST tool are rarely applied in research, we hypothesized that there is a large gap between the number of known annotated enzymes and the actual number in the protein universe, which significantly limits our ability to extract additional biologically relevant functional information from the available sequencing data. To reliably expand the enzyme space, we developed DomSign, a highly accurate domain signature–based enzyme functional prediction tool to assign Enzyme Commission (EC) digits. RESULTS: DomSign is a top-down prediction engine that yields results comparable, or superior, to those from many benchmark EC number prediction tools, including BLASTP, when a homolog with an identity >30% is not available in the database. Performance tests showed that DomSign is a highly reliable enzyme EC number annotation tool. After multiple tests, the accuracy is thought to be greater than 90%. Thus, DomSign can be applied to large-scale datasets, with the goal of expanding the enzyme space with high fidelity. Using DomSign, we successfully increased the percentage of EC-tagged enzymes from 12% to 30% in UniProt-TrEMBL. In the Kyoto Encyclopedia of Genes and Genomes bacterial database, the percentage of EC-tagged enzymes for each bacterial genome could be increased from 26.0% to 33.2% on average. Metagenomic mining was also efficient, as exemplified by the application of DomSign to the Human Microbiome Project dataset, recovering nearly one million new EC-labeled enzymes. CONCLUSIONS: Our results offer preliminarily confirmation of the existence of the hypothesized huge number of “hidden enzymes” in the protein universe, the identification of which could substantially further our understanding of the metabolisms of diverse organisms and also facilitate bioengineering by providing a richer enzyme resource. Furthermore, our results highlight the necessity of using more advanced computational tools than BLAST in protein database annotations to extract additional biologically relevant functional information from the available biological sequences. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0499-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4389672
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43896722015-04-09 DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe Wang, Tianmin Mori, Hiroshi Zhang, Chong Kurokawa, Ken Xing, Xin-Hui Yamada, Takuji BMC Bioinformatics Research Article BACKGROUND: Computational predictions of catalytic function are vital for in-depth understanding of enzymes. Because several novel approaches performing better than the common BLAST tool are rarely applied in research, we hypothesized that there is a large gap between the number of known annotated enzymes and the actual number in the protein universe, which significantly limits our ability to extract additional biologically relevant functional information from the available sequencing data. To reliably expand the enzyme space, we developed DomSign, a highly accurate domain signature–based enzyme functional prediction tool to assign Enzyme Commission (EC) digits. RESULTS: DomSign is a top-down prediction engine that yields results comparable, or superior, to those from many benchmark EC number prediction tools, including BLASTP, when a homolog with an identity >30% is not available in the database. Performance tests showed that DomSign is a highly reliable enzyme EC number annotation tool. After multiple tests, the accuracy is thought to be greater than 90%. Thus, DomSign can be applied to large-scale datasets, with the goal of expanding the enzyme space with high fidelity. Using DomSign, we successfully increased the percentage of EC-tagged enzymes from 12% to 30% in UniProt-TrEMBL. In the Kyoto Encyclopedia of Genes and Genomes bacterial database, the percentage of EC-tagged enzymes for each bacterial genome could be increased from 26.0% to 33.2% on average. Metagenomic mining was also efficient, as exemplified by the application of DomSign to the Human Microbiome Project dataset, recovering nearly one million new EC-labeled enzymes. CONCLUSIONS: Our results offer preliminarily confirmation of the existence of the hypothesized huge number of “hidden enzymes” in the protein universe, the identification of which could substantially further our understanding of the metabolisms of diverse organisms and also facilitate bioengineering by providing a richer enzyme resource. Furthermore, our results highlight the necessity of using more advanced computational tools than BLAST in protein database annotations to extract additional biologically relevant functional information from the available biological sequences. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0499-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-21 /pmc/articles/PMC4389672/ /pubmed/25888481 http://dx.doi.org/10.1186/s12859-015-0499-y Text en © Wang et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Wang, Tianmin
Mori, Hiroshi
Zhang, Chong
Kurokawa, Ken
Xing, Xin-Hui
Yamada, Takuji
DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe
title DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe
title_full DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe
title_fullStr DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe
title_full_unstemmed DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe
title_short DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe
title_sort domsign: a top-down annotation pipeline to enlarge enzyme space in the protein universe
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4389672/
https://www.ncbi.nlm.nih.gov/pubmed/25888481
http://dx.doi.org/10.1186/s12859-015-0499-y
work_keys_str_mv AT wangtianmin domsignatopdownannotationpipelinetoenlargeenzymespaceintheproteinuniverse
AT morihiroshi domsignatopdownannotationpipelinetoenlargeenzymespaceintheproteinuniverse
AT zhangchong domsignatopdownannotationpipelinetoenlargeenzymespaceintheproteinuniverse
AT kurokawaken domsignatopdownannotationpipelinetoenlargeenzymespaceintheproteinuniverse
AT xingxinhui domsignatopdownannotationpipelinetoenlargeenzymespaceintheproteinuniverse
AT yamadatakuji domsignatopdownannotationpipelinetoenlargeenzymespaceintheproteinuniverse