Cargando…

Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm

MOTIVATION: Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM s...

Descripción completa

Detalles Bibliográficos
Autores principales: Takeda, Taikai, Hamada, Michiaki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860613/
https://www.ncbi.nlm.nih.gov/pubmed/29040374
http://dx.doi.org/10.1093/bioinformatics/btx643
_version_ 1783307982293958656
author Takeda, Taikai
Hamada, Michiaki
author_facet Takeda, Taikai
Hamada, Michiaki
author_sort Takeda, Taikai
collection PubMed
description MOTIVATION: Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy. RESULTS: We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criterion, which is widely utilized in model selection for probabilistic models with hidden variables. Our simulations indicated that this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/bigsea-t/fab-phmm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5860613
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58606132018-03-28 Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm Takeda, Taikai Hamada, Michiaki Bioinformatics Original Papers MOTIVATION: Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy. RESULTS: We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criterion, which is widely utilized in model selection for probabilistic models with hidden variables. Our simulations indicated that this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/bigsea-t/fab-phmm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-02-15 2017-10-12 /pmc/articles/PMC5860613/ /pubmed/29040374 http://dx.doi.org/10.1093/bioinformatics/btx643 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Takeda, Taikai
Hamada, Michiaki
Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm
title Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm
title_full Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm
title_fullStr Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm
title_full_unstemmed Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm
title_short Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm
title_sort beyond similarity assessment: selecting the optimal model for sequence alignment via the factorized asymptotic bayesian algorithm
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860613/
https://www.ncbi.nlm.nih.gov/pubmed/29040374
http://dx.doi.org/10.1093/bioinformatics/btx643
work_keys_str_mv AT takedataikai beyondsimilarityassessmentselectingtheoptimalmodelforsequencealignmentviathefactorizedasymptoticbayesianalgorithm
AT hamadamichiaki beyondsimilarityassessmentselectingtheoptimalmodelforsequencealignmentviathefactorizedasymptoticbayesianalgorithm