Cargando…

Fast Statistical Alignment

We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bradley, Robert K., Roberts, Adam, Smoot, Michael, Juvekar, Sudeep, Do, Jaeyoung, Dewey, Colin, Holmes, Ian, Pachter, Lior
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2684580/ https://www.ncbi.nlm.nih.gov/pubmed/19478997 http://dx.doi.org/10.1371/journal.pcbi.1000392

_version_	1782167243370528768
author	Bradley, Robert K. Roberts, Adam Smoot, Michael Juvekar, Sudeep Do, Jaeyoung Dewey, Colin Holmes, Ian Pachter, Lior
author_facet	Bradley, Robert K. Roberts, Adam Smoot, Michael Juvekar, Sudeep Do, Jaeyoung Dewey, Colin Holmes, Ian Pachter, Lior
author_sort	Bradley, Robert K.
collection	PubMed
description	We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment—previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches—yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.
format	Text
id	pubmed-2684580
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-26845802009-05-29 Fast Statistical Alignment Bradley, Robert K. Roberts, Adam Smoot, Michael Juvekar, Sudeep Do, Jaeyoung Dewey, Colin Holmes, Ian Pachter, Lior PLoS Comput Biol Research Article We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment—previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches—yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/. Public Library of Science 2009-05-29 /pmc/articles/PMC2684580/ /pubmed/19478997 http://dx.doi.org/10.1371/journal.pcbi.1000392 Text en Bradley et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Bradley, Robert K. Roberts, Adam Smoot, Michael Juvekar, Sudeep Do, Jaeyoung Dewey, Colin Holmes, Ian Pachter, Lior Fast Statistical Alignment
title	Fast Statistical Alignment
title_full	Fast Statistical Alignment
title_fullStr	Fast Statistical Alignment
title_full_unstemmed	Fast Statistical Alignment
title_short	Fast Statistical Alignment
title_sort	fast statistical alignment
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2684580/ https://www.ncbi.nlm.nih.gov/pubmed/19478997 http://dx.doi.org/10.1371/journal.pcbi.1000392
work_keys_str_mv	AT bradleyrobertk faststatisticalalignment AT robertsadam faststatisticalalignment AT smootmichael faststatisticalalignment AT juvekarsudeep faststatisticalalignment AT dojaeyoung faststatisticalalignment AT deweycolin faststatisticalalignment AT holmesian faststatisticalalignment AT pachterlior faststatisticalalignment

Fast Statistical Alignment

Ejemplares similares