Cargando…

HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors

BACKGROUND: Protein domain classification is an important step in metagenomic annotation. The state-of-the-art method for protein domain classification is profile HMM-based alignment. However, the relatively high rates of insertions and deletions in homopolymer regions of pyrosequencing reads create...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yuan, Sun, Yanni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3115854/
https://www.ncbi.nlm.nih.gov/pubmed/21609463
http://dx.doi.org/10.1186/1471-2105-12-198
_version_ 1782206180055056384
author Zhang, Yuan
Sun, Yanni
author_facet Zhang, Yuan
Sun, Yanni
author_sort Zhang, Yuan
collection PubMed
description BACKGROUND: Protein domain classification is an important step in metagenomic annotation. The state-of-the-art method for protein domain classification is profile HMM-based alignment. However, the relatively high rates of insertions and deletions in homopolymer regions of pyrosequencing reads create frameshifts, causing conventional profile HMM alignment tools to generate alignments with marginal scores. This makes error-containing gene fragments unclassifiable with conventional tools. Thus, there is a need for an accurate domain classification tool that can detect and correct sequencing errors. RESULTS: We introduce HMM-FRAME, a protein domain classification tool based on an augmented Viterbi algorithm that can incorporate error models from different sequencing platforms. HMM-FRAME corrects sequencing errors and classifies putative gene fragments into domain families. It achieved high error detection sensitivity and specificity in a data set with annotated errors. We applied HMM-FRAME in Targeted Metagenomics and a published metagenomic data set. The results showed that our tool can correct frameshifts in error-containing sequences, generate much longer alignments with significantly smaller E-values, and classify more sequences into their native families. CONCLUSIONS: HMM-FRAME provides a complementary protein domain classification tool to conventional profile HMM-based methods for data sets containing frameshifts. Its current implementation is best used for small-scale metagenomic data sets. The source code of HMM-FRAME can be downloaded at http://www.cse.msu.edu/~zhangy72/hmmframe/ and at https://sourceforge.net/projects/hmm-frame/.
format Online
Article
Text
id pubmed-3115854
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31158542011-06-16 HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors Zhang, Yuan Sun, Yanni BMC Bioinformatics Methodology Article BACKGROUND: Protein domain classification is an important step in metagenomic annotation. The state-of-the-art method for protein domain classification is profile HMM-based alignment. However, the relatively high rates of insertions and deletions in homopolymer regions of pyrosequencing reads create frameshifts, causing conventional profile HMM alignment tools to generate alignments with marginal scores. This makes error-containing gene fragments unclassifiable with conventional tools. Thus, there is a need for an accurate domain classification tool that can detect and correct sequencing errors. RESULTS: We introduce HMM-FRAME, a protein domain classification tool based on an augmented Viterbi algorithm that can incorporate error models from different sequencing platforms. HMM-FRAME corrects sequencing errors and classifies putative gene fragments into domain families. It achieved high error detection sensitivity and specificity in a data set with annotated errors. We applied HMM-FRAME in Targeted Metagenomics and a published metagenomic data set. The results showed that our tool can correct frameshifts in error-containing sequences, generate much longer alignments with significantly smaller E-values, and classify more sequences into their native families. CONCLUSIONS: HMM-FRAME provides a complementary protein domain classification tool to conventional profile HMM-based methods for data sets containing frameshifts. Its current implementation is best used for small-scale metagenomic data sets. The source code of HMM-FRAME can be downloaded at http://www.cse.msu.edu/~zhangy72/hmmframe/ and at https://sourceforge.net/projects/hmm-frame/. BioMed Central 2011-05-24 /pmc/articles/PMC3115854/ /pubmed/21609463 http://dx.doi.org/10.1186/1471-2105-12-198 Text en Copyright ©2011 Zhang and Sun; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Zhang, Yuan
Sun, Yanni
HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors
title HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors
title_full HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors
title_fullStr HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors
title_full_unstemmed HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors
title_short HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors
title_sort hmm-frame: accurate protein domain classification for metagenomic sequences containing frameshift errors
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3115854/
https://www.ncbi.nlm.nih.gov/pubmed/21609463
http://dx.doi.org/10.1186/1471-2105-12-198
work_keys_str_mv AT zhangyuan hmmframeaccurateproteindomainclassificationformetagenomicsequencescontainingframeshifterrors
AT sunyanni hmmframeaccurateproteindomainclassificationformetagenomicsequencescontainingframeshifterrors