Cargando…

The identification of complete domains within protein sequences using accurate E-values for semi-global alignment

The sequencing of complete genomes has created a pressing need for automated annotation of gene function. Because domains are the basic units of protein function and evolution, a gene can be annotated from a domain database by aligning domains to the corresponding protein sequence. Ideally, complete...

Descripción completa

Detalles Bibliográficos
Autores principales: Kann, Maricel G., Sheetlin, Sergey L., Park, Yonil, Bryant, Stephen H., Spouge, John L.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950549/
https://www.ncbi.nlm.nih.gov/pubmed/17596268
http://dx.doi.org/10.1093/nar/gkm414
_version_ 1782134561127268352
author Kann, Maricel G.
Sheetlin, Sergey L.
Park, Yonil
Bryant, Stephen H.
Spouge, John L.
author_facet Kann, Maricel G.
Sheetlin, Sergey L.
Park, Yonil
Bryant, Stephen H.
Spouge, John L.
author_sort Kann, Maricel G.
collection PubMed
description The sequencing of complete genomes has created a pressing need for automated annotation of gene function. Because domains are the basic units of protein function and evolution, a gene can be annotated from a domain database by aligning domains to the corresponding protein sequence. Ideally, complete domains are aligned to protein subsequences, in a ‘semi-global alignment’. Local alignment, which aligns pieces of domains to subsequences, is common in high-throughput annotation applications, however. It is a mature technique, with the heuristics and accurate E-values required for screening large databases and evaluating the screening results. Hidden Markov models (HMMs) provide an alternative theoretical framework for semi-global alignment, but their use is limited because they lack heuristic acceleration and accurate E-values. Our new tool, GLOBAL, overcomes some limitations of previous semi-global HMMs: it has accurate E-values and the possibility of the heuristic acceleration required for high-throughput applications. Moreover, according to a standard of truth based on protein structure, two semi-global HMM alignment tools (GLOBAL and HMMer) had comparable performance in identifying complete domains, but distinctly outperformed two tools based on local alignment. When searching for complete protein domains, therefore, GLOBAL avoids disadvantages commonly associated with HMMs, yet maintains their superior retrieval performance.
format Text
id pubmed-1950549
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-19505492007-08-22 The identification of complete domains within protein sequences using accurate E-values for semi-global alignment Kann, Maricel G. Sheetlin, Sergey L. Park, Yonil Bryant, Stephen H. Spouge, John L. Nucleic Acids Res Computational Biology The sequencing of complete genomes has created a pressing need for automated annotation of gene function. Because domains are the basic units of protein function and evolution, a gene can be annotated from a domain database by aligning domains to the corresponding protein sequence. Ideally, complete domains are aligned to protein subsequences, in a ‘semi-global alignment’. Local alignment, which aligns pieces of domains to subsequences, is common in high-throughput annotation applications, however. It is a mature technique, with the heuristics and accurate E-values required for screening large databases and evaluating the screening results. Hidden Markov models (HMMs) provide an alternative theoretical framework for semi-global alignment, but their use is limited because they lack heuristic acceleration and accurate E-values. Our new tool, GLOBAL, overcomes some limitations of previous semi-global HMMs: it has accurate E-values and the possibility of the heuristic acceleration required for high-throughput applications. Moreover, according to a standard of truth based on protein structure, two semi-global HMM alignment tools (GLOBAL and HMMer) had comparable performance in identifying complete domains, but distinctly outperformed two tools based on local alignment. When searching for complete protein domains, therefore, GLOBAL avoids disadvantages commonly associated with HMMs, yet maintains their superior retrieval performance. Oxford University Press 2007-07 2007-06-27 /pmc/articles/PMC1950549/ /pubmed/17596268 http://dx.doi.org/10.1093/nar/gkm414 Text en © 2007 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Kann, Maricel G.
Sheetlin, Sergey L.
Park, Yonil
Bryant, Stephen H.
Spouge, John L.
The identification of complete domains within protein sequences using accurate E-values for semi-global alignment
title The identification of complete domains within protein sequences using accurate E-values for semi-global alignment
title_full The identification of complete domains within protein sequences using accurate E-values for semi-global alignment
title_fullStr The identification of complete domains within protein sequences using accurate E-values for semi-global alignment
title_full_unstemmed The identification of complete domains within protein sequences using accurate E-values for semi-global alignment
title_short The identification of complete domains within protein sequences using accurate E-values for semi-global alignment
title_sort identification of complete domains within protein sequences using accurate e-values for semi-global alignment
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950549/
https://www.ncbi.nlm.nih.gov/pubmed/17596268
http://dx.doi.org/10.1093/nar/gkm414
work_keys_str_mv AT kannmaricelg theidentificationofcompletedomainswithinproteinsequencesusingaccurateevaluesforsemiglobalalignment
AT sheetlinsergeyl theidentificationofcompletedomainswithinproteinsequencesusingaccurateevaluesforsemiglobalalignment
AT parkyonil theidentificationofcompletedomainswithinproteinsequencesusingaccurateevaluesforsemiglobalalignment
AT bryantstephenh theidentificationofcompletedomainswithinproteinsequencesusingaccurateevaluesforsemiglobalalignment
AT spougejohnl theidentificationofcompletedomainswithinproteinsequencesusingaccurateevaluesforsemiglobalalignment
AT kannmaricelg identificationofcompletedomainswithinproteinsequencesusingaccurateevaluesforsemiglobalalignment
AT sheetlinsergeyl identificationofcompletedomainswithinproteinsequencesusingaccurateevaluesforsemiglobalalignment
AT parkyonil identificationofcompletedomainswithinproteinsequencesusingaccurateevaluesforsemiglobalalignment
AT bryantstephenh identificationofcompletedomainswithinproteinsequencesusingaccurateevaluesforsemiglobalalignment
AT spougejohnl identificationofcompletedomainswithinproteinsequencesusingaccurateevaluesforsemiglobalalignment