Cargando…

A model-based information sharing protocol for profile Hidden Markov Models used for HIV-1 recombination detection

BACKGROUND: In many applications, a family of nucleotide or protein sequences classified into several subfamilies has to be modeled. Profile Hidden Markov Models (pHMMs) are widely used for this task, modeling each subfamily separately by one pHMM. However, a major drawback of this approach is the d...

Descripción completa

Detalles Bibliográficos
Autores principales: Bulla, Ingo, Schultz, Anne-Kathrin, Chesneau, Christophe, Mark, Tanya, Serea, Florin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4230192/
https://www.ncbi.nlm.nih.gov/pubmed/24946781
http://dx.doi.org/10.1186/1471-2105-15-205
_version_ 1782344225132642304
author Bulla, Ingo
Schultz, Anne-Kathrin
Chesneau, Christophe
Mark, Tanya
Serea, Florin
author_facet Bulla, Ingo
Schultz, Anne-Kathrin
Chesneau, Christophe
Mark, Tanya
Serea, Florin
author_sort Bulla, Ingo
collection PubMed
description BACKGROUND: In many applications, a family of nucleotide or protein sequences classified into several subfamilies has to be modeled. Profile Hidden Markov Models (pHMMs) are widely used for this task, modeling each subfamily separately by one pHMM. However, a major drawback of this approach is the difficulty of dealing with subfamilies composed of very few sequences. One of the most crucial bioinformatical tasks affected by the problem of small-size subfamilies is the subtyping of human immunodeficiency virus type 1 (HIV-1) sequences, i.e., HIV-1 subtypes for which only a small number of sequences is known. RESULTS: To deal with small samples for particular subfamilies of HIV-1, we introduce a novel model-based information sharing protocol. It estimates the emission probabilities of the pHMM modeling a particular subfamily not only based on the nucleotide frequencies of the respective subfamily but also incorporating the nucleotide frequencies of all available subfamilies. To this end, the underlying probabilistic model mimics the pattern of commonality and variation between the subtypes with regards to the biological characteristics of HI viruses. In order to implement the proposed protocol, we make use of an existing HMM architecture and its associated inference engine. CONCLUSIONS: We apply the modified algorithm to classify HIV-1 sequence data in the form of partial HIV-1 sequences and semi-artificial recombinants. Thereby, we demonstrate that the performance of pHMMs can be significantly improved by the proposed technique. Moreover, we show that our algorithm performs significantly better than Simplot and Bootscanning.
format Online
Article
Text
id pubmed-4230192
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42301922014-11-14 A model-based information sharing protocol for profile Hidden Markov Models used for HIV-1 recombination detection Bulla, Ingo Schultz, Anne-Kathrin Chesneau, Christophe Mark, Tanya Serea, Florin BMC Bioinformatics Research Article BACKGROUND: In many applications, a family of nucleotide or protein sequences classified into several subfamilies has to be modeled. Profile Hidden Markov Models (pHMMs) are widely used for this task, modeling each subfamily separately by one pHMM. However, a major drawback of this approach is the difficulty of dealing with subfamilies composed of very few sequences. One of the most crucial bioinformatical tasks affected by the problem of small-size subfamilies is the subtyping of human immunodeficiency virus type 1 (HIV-1) sequences, i.e., HIV-1 subtypes for which only a small number of sequences is known. RESULTS: To deal with small samples for particular subfamilies of HIV-1, we introduce a novel model-based information sharing protocol. It estimates the emission probabilities of the pHMM modeling a particular subfamily not only based on the nucleotide frequencies of the respective subfamily but also incorporating the nucleotide frequencies of all available subfamilies. To this end, the underlying probabilistic model mimics the pattern of commonality and variation between the subtypes with regards to the biological characteristics of HI viruses. In order to implement the proposed protocol, we make use of an existing HMM architecture and its associated inference engine. CONCLUSIONS: We apply the modified algorithm to classify HIV-1 sequence data in the form of partial HIV-1 sequences and semi-artificial recombinants. Thereby, we demonstrate that the performance of pHMMs can be significantly improved by the proposed technique. Moreover, we show that our algorithm performs significantly better than Simplot and Bootscanning. BioMed Central 2014-06-19 /pmc/articles/PMC4230192/ /pubmed/24946781 http://dx.doi.org/10.1186/1471-2105-15-205 Text en Copyright © 2014 Bulla et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research Article
Bulla, Ingo
Schultz, Anne-Kathrin
Chesneau, Christophe
Mark, Tanya
Serea, Florin
A model-based information sharing protocol for profile Hidden Markov Models used for HIV-1 recombination detection
title A model-based information sharing protocol for profile Hidden Markov Models used for HIV-1 recombination detection
title_full A model-based information sharing protocol for profile Hidden Markov Models used for HIV-1 recombination detection
title_fullStr A model-based information sharing protocol for profile Hidden Markov Models used for HIV-1 recombination detection
title_full_unstemmed A model-based information sharing protocol for profile Hidden Markov Models used for HIV-1 recombination detection
title_short A model-based information sharing protocol for profile Hidden Markov Models used for HIV-1 recombination detection
title_sort model-based information sharing protocol for profile hidden markov models used for hiv-1 recombination detection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4230192/
https://www.ncbi.nlm.nih.gov/pubmed/24946781
http://dx.doi.org/10.1186/1471-2105-15-205
work_keys_str_mv AT bullaingo amodelbasedinformationsharingprotocolforprofilehiddenmarkovmodelsusedforhiv1recombinationdetection
AT schultzannekathrin amodelbasedinformationsharingprotocolforprofilehiddenmarkovmodelsusedforhiv1recombinationdetection
AT chesneauchristophe amodelbasedinformationsharingprotocolforprofilehiddenmarkovmodelsusedforhiv1recombinationdetection
AT marktanya amodelbasedinformationsharingprotocolforprofilehiddenmarkovmodelsusedforhiv1recombinationdetection
AT sereaflorin amodelbasedinformationsharingprotocolforprofilehiddenmarkovmodelsusedforhiv1recombinationdetection
AT bullaingo modelbasedinformationsharingprotocolforprofilehiddenmarkovmodelsusedforhiv1recombinationdetection
AT schultzannekathrin modelbasedinformationsharingprotocolforprofilehiddenmarkovmodelsusedforhiv1recombinationdetection
AT chesneauchristophe modelbasedinformationsharingprotocolforprofilehiddenmarkovmodelsusedforhiv1recombinationdetection
AT marktanya modelbasedinformationsharingprotocolforprofilehiddenmarkovmodelsusedforhiv1recombinationdetection
AT sereaflorin modelbasedinformationsharingprotocolforprofilehiddenmarkovmodelsusedforhiv1recombinationdetection