Cargando…
DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe
BACKGROUND: Computational predictions of catalytic function are vital for in-depth understanding of enzymes. Because several novel approaches performing better than the common BLAST tool are rarely applied in research, we hypothesized that there is a large gap between the number of known annotated e...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4389672/ https://www.ncbi.nlm.nih.gov/pubmed/25888481 http://dx.doi.org/10.1186/s12859-015-0499-y |
_version_ | 1782365600292536320 |
---|---|
author | Wang, Tianmin Mori, Hiroshi Zhang, Chong Kurokawa, Ken Xing, Xin-Hui Yamada, Takuji |
author_facet | Wang, Tianmin Mori, Hiroshi Zhang, Chong Kurokawa, Ken Xing, Xin-Hui Yamada, Takuji |
author_sort | Wang, Tianmin |
collection | PubMed |
description | BACKGROUND: Computational predictions of catalytic function are vital for in-depth understanding of enzymes. Because several novel approaches performing better than the common BLAST tool are rarely applied in research, we hypothesized that there is a large gap between the number of known annotated enzymes and the actual number in the protein universe, which significantly limits our ability to extract additional biologically relevant functional information from the available sequencing data. To reliably expand the enzyme space, we developed DomSign, a highly accurate domain signature–based enzyme functional prediction tool to assign Enzyme Commission (EC) digits. RESULTS: DomSign is a top-down prediction engine that yields results comparable, or superior, to those from many benchmark EC number prediction tools, including BLASTP, when a homolog with an identity >30% is not available in the database. Performance tests showed that DomSign is a highly reliable enzyme EC number annotation tool. After multiple tests, the accuracy is thought to be greater than 90%. Thus, DomSign can be applied to large-scale datasets, with the goal of expanding the enzyme space with high fidelity. Using DomSign, we successfully increased the percentage of EC-tagged enzymes from 12% to 30% in UniProt-TrEMBL. In the Kyoto Encyclopedia of Genes and Genomes bacterial database, the percentage of EC-tagged enzymes for each bacterial genome could be increased from 26.0% to 33.2% on average. Metagenomic mining was also efficient, as exemplified by the application of DomSign to the Human Microbiome Project dataset, recovering nearly one million new EC-labeled enzymes. CONCLUSIONS: Our results offer preliminarily confirmation of the existence of the hypothesized huge number of “hidden enzymes” in the protein universe, the identification of which could substantially further our understanding of the metabolisms of diverse organisms and also facilitate bioengineering by providing a richer enzyme resource. Furthermore, our results highlight the necessity of using more advanced computational tools than BLAST in protein database annotations to extract additional biologically relevant functional information from the available biological sequences. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0499-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4389672 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43896722015-04-09 DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe Wang, Tianmin Mori, Hiroshi Zhang, Chong Kurokawa, Ken Xing, Xin-Hui Yamada, Takuji BMC Bioinformatics Research Article BACKGROUND: Computational predictions of catalytic function are vital for in-depth understanding of enzymes. Because several novel approaches performing better than the common BLAST tool are rarely applied in research, we hypothesized that there is a large gap between the number of known annotated enzymes and the actual number in the protein universe, which significantly limits our ability to extract additional biologically relevant functional information from the available sequencing data. To reliably expand the enzyme space, we developed DomSign, a highly accurate domain signature–based enzyme functional prediction tool to assign Enzyme Commission (EC) digits. RESULTS: DomSign is a top-down prediction engine that yields results comparable, or superior, to those from many benchmark EC number prediction tools, including BLASTP, when a homolog with an identity >30% is not available in the database. Performance tests showed that DomSign is a highly reliable enzyme EC number annotation tool. After multiple tests, the accuracy is thought to be greater than 90%. Thus, DomSign can be applied to large-scale datasets, with the goal of expanding the enzyme space with high fidelity. Using DomSign, we successfully increased the percentage of EC-tagged enzymes from 12% to 30% in UniProt-TrEMBL. In the Kyoto Encyclopedia of Genes and Genomes bacterial database, the percentage of EC-tagged enzymes for each bacterial genome could be increased from 26.0% to 33.2% on average. Metagenomic mining was also efficient, as exemplified by the application of DomSign to the Human Microbiome Project dataset, recovering nearly one million new EC-labeled enzymes. CONCLUSIONS: Our results offer preliminarily confirmation of the existence of the hypothesized huge number of “hidden enzymes” in the protein universe, the identification of which could substantially further our understanding of the metabolisms of diverse organisms and also facilitate bioengineering by providing a richer enzyme resource. Furthermore, our results highlight the necessity of using more advanced computational tools than BLAST in protein database annotations to extract additional biologically relevant functional information from the available biological sequences. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0499-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-21 /pmc/articles/PMC4389672/ /pubmed/25888481 http://dx.doi.org/10.1186/s12859-015-0499-y Text en © Wang et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Wang, Tianmin Mori, Hiroshi Zhang, Chong Kurokawa, Ken Xing, Xin-Hui Yamada, Takuji DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe |
title | DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe |
title_full | DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe |
title_fullStr | DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe |
title_full_unstemmed | DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe |
title_short | DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe |
title_sort | domsign: a top-down annotation pipeline to enlarge enzyme space in the protein universe |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4389672/ https://www.ncbi.nlm.nih.gov/pubmed/25888481 http://dx.doi.org/10.1186/s12859-015-0499-y |
work_keys_str_mv | AT wangtianmin domsignatopdownannotationpipelinetoenlargeenzymespaceintheproteinuniverse AT morihiroshi domsignatopdownannotationpipelinetoenlargeenzymespaceintheproteinuniverse AT zhangchong domsignatopdownannotationpipelinetoenlargeenzymespaceintheproteinuniverse AT kurokawaken domsignatopdownannotationpipelinetoenlargeenzymespaceintheproteinuniverse AT xingxinhui domsignatopdownannotationpipelinetoenlargeenzymespaceintheproteinuniverse AT yamadatakuji domsignatopdownannotationpipelinetoenlargeenzymespaceintheproteinuniverse |