Cargando…
xHMMER3x2: Utilizing HMMER3’s speed and HMMER2’s sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation
BACKGROUND: While the local-mode HMMER3 is notable for its massive speed improvement, the slower glocal-mode HMMER2 is more exact for domain annotation by enforcing full domain-to-sequence alignments. Since a unit of domain necessarily implies a unit of function, local-mode HMMER3 alone remains insu...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5126834/ https://www.ncbi.nlm.nih.gov/pubmed/27894340 http://dx.doi.org/10.1186/s13062-016-0163-0 |
_version_ | 1782470178707079168 |
---|---|
author | Yap, Choon-Kong Eisenhaber, Birgit Eisenhaber, Frank Wong, Wing-Cheong |
author_facet | Yap, Choon-Kong Eisenhaber, Birgit Eisenhaber, Frank Wong, Wing-Cheong |
author_sort | Yap, Choon-Kong |
collection | PubMed |
description | BACKGROUND: While the local-mode HMMER3 is notable for its massive speed improvement, the slower glocal-mode HMMER2 is more exact for domain annotation by enforcing full domain-to-sequence alignments. Since a unit of domain necessarily implies a unit of function, local-mode HMMER3 alone remains insufficient for precise function annotation tasks. In addition, the incomparable E-values for the same domain model by different HMMER builds create difficulty when checking for domain annotation consistency on a large-scale basis. RESULTS: In this work, both the speed of HMMER3 and glocal-mode alignment of HMMER2 are combined within the xHMMER3x2 framework for tackling the large-scale domain annotation task. Briefly, HMMER3 is utilized for initial domain detection so that HMMER2 can subsequently perform the glocal-mode, sequence-to-full-domain alignments for the detected HMMER3 hits. An E-value calibration procedure is required to ensure that the search space by HMMER2 is sufficiently replicated by HMMER3. We find that the latter is straightforwardly possible for ~80% of the models in the Pfam domain library (release 29). However in the case of the remaining ~20% of HMMER3 domain models, the respective HMMER2 counterparts are more sensitive. Thus, HMMER3 searches alone are insufficient to ensure sensitivity and a HMMER2-based search needs to be initiated. When tested on the set of UniProt human sequences, xHMMER3x2 can be configured to be between 7× and 201× faster than HMMER2, but with descending domain detection sensitivity from 99.8 to 95.7% with respect to HMMER2 alone; HMMER3’s sensitivity was 95.7%. At extremes, xHMMER3x2 is either the slow glocal-mode HMMER2 or the fast HMMER3 with glocal-mode. Finally, the E-values to false-positive rates (FPR) mapping by xHMMER3x2 allows E-values of different model builds to be compared, so that any annotation discrepancies in a large-scale annotation exercise can be flagged for further examination by dissectHMMER. CONCLUSION: The xHMMER3x2 workflow allows large-scale domain annotation speed to be drastically improved over HMMER2 without compromising for domain-detection with regard to sensitivity and sequence-to-domain alignment incompleteness. The xHMMER3x2 code and its webserver (for Pfam release 27, 28 and 29) are freely available at http://xhmmer3x2.bii.a-star.edu.sg/. REVIEWERS: Reviewed by Thomas Dandekar, L. Aravind, Oliviero Carugo and Shamil Sunyaev. For the full reviews, please go to the Reviewers’ comments section. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13062-016-0163-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5126834 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51268342016-12-08 xHMMER3x2: Utilizing HMMER3’s speed and HMMER2’s sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation Yap, Choon-Kong Eisenhaber, Birgit Eisenhaber, Frank Wong, Wing-Cheong Biol Direct Research BACKGROUND: While the local-mode HMMER3 is notable for its massive speed improvement, the slower glocal-mode HMMER2 is more exact for domain annotation by enforcing full domain-to-sequence alignments. Since a unit of domain necessarily implies a unit of function, local-mode HMMER3 alone remains insufficient for precise function annotation tasks. In addition, the incomparable E-values for the same domain model by different HMMER builds create difficulty when checking for domain annotation consistency on a large-scale basis. RESULTS: In this work, both the speed of HMMER3 and glocal-mode alignment of HMMER2 are combined within the xHMMER3x2 framework for tackling the large-scale domain annotation task. Briefly, HMMER3 is utilized for initial domain detection so that HMMER2 can subsequently perform the glocal-mode, sequence-to-full-domain alignments for the detected HMMER3 hits. An E-value calibration procedure is required to ensure that the search space by HMMER2 is sufficiently replicated by HMMER3. We find that the latter is straightforwardly possible for ~80% of the models in the Pfam domain library (release 29). However in the case of the remaining ~20% of HMMER3 domain models, the respective HMMER2 counterparts are more sensitive. Thus, HMMER3 searches alone are insufficient to ensure sensitivity and a HMMER2-based search needs to be initiated. When tested on the set of UniProt human sequences, xHMMER3x2 can be configured to be between 7× and 201× faster than HMMER2, but with descending domain detection sensitivity from 99.8 to 95.7% with respect to HMMER2 alone; HMMER3’s sensitivity was 95.7%. At extremes, xHMMER3x2 is either the slow glocal-mode HMMER2 or the fast HMMER3 with glocal-mode. Finally, the E-values to false-positive rates (FPR) mapping by xHMMER3x2 allows E-values of different model builds to be compared, so that any annotation discrepancies in a large-scale annotation exercise can be flagged for further examination by dissectHMMER. CONCLUSION: The xHMMER3x2 workflow allows large-scale domain annotation speed to be drastically improved over HMMER2 without compromising for domain-detection with regard to sensitivity and sequence-to-domain alignment incompleteness. The xHMMER3x2 code and its webserver (for Pfam release 27, 28 and 29) are freely available at http://xhmmer3x2.bii.a-star.edu.sg/. REVIEWERS: Reviewed by Thomas Dandekar, L. Aravind, Oliviero Carugo and Shamil Sunyaev. For the full reviews, please go to the Reviewers’ comments section. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13062-016-0163-0) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-29 /pmc/articles/PMC5126834/ /pubmed/27894340 http://dx.doi.org/10.1186/s13062-016-0163-0 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Yap, Choon-Kong Eisenhaber, Birgit Eisenhaber, Frank Wong, Wing-Cheong xHMMER3x2: Utilizing HMMER3’s speed and HMMER2’s sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation |
title | xHMMER3x2: Utilizing HMMER3’s speed and HMMER2’s sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation |
title_full | xHMMER3x2: Utilizing HMMER3’s speed and HMMER2’s sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation |
title_fullStr | xHMMER3x2: Utilizing HMMER3’s speed and HMMER2’s sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation |
title_full_unstemmed | xHMMER3x2: Utilizing HMMER3’s speed and HMMER2’s sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation |
title_short | xHMMER3x2: Utilizing HMMER3’s speed and HMMER2’s sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation |
title_sort | xhmmer3x2: utilizing hmmer3’s speed and hmmer2’s sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5126834/ https://www.ncbi.nlm.nih.gov/pubmed/27894340 http://dx.doi.org/10.1186/s13062-016-0163-0 |
work_keys_str_mv | AT yapchoonkong xhmmer3x2utilizinghmmer3sspeedandhmmer2ssensitivityandspecificityintheglocalalignmentmodeforimprovedlargescaleproteindomainannotation AT eisenhaberbirgit xhmmer3x2utilizinghmmer3sspeedandhmmer2ssensitivityandspecificityintheglocalalignmentmodeforimprovedlargescaleproteindomainannotation AT eisenhaberfrank xhmmer3x2utilizinghmmer3sspeedandhmmer2ssensitivityandspecificityintheglocalalignmentmodeforimprovedlargescaleproteindomainannotation AT wongwingcheong xhmmer3x2utilizinghmmer3sspeedandhmmer2ssensitivityandspecificityintheglocalalignmentmodeforimprovedlargescaleproteindomainannotation |