Cargando…

Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling

Most genome assemblers construct point estimates, choosing only a single genome sequence from among many alternative hypotheses that are supported by the data. We present a Markov chain Monte Carlo approach to sequence assembly that instead generates distributions of assembly hypotheses with posteri...

Descripción completa

Detalles Bibliográficos
Autores principales: Howison, Mark, Zapata, Felipe, Edwards, Erika J., Dunn, Casey W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4072599/
https://www.ncbi.nlm.nih.gov/pubmed/24968249
http://dx.doi.org/10.1371/journal.pone.0099497
_version_ 1782322988615467008
author Howison, Mark
Zapata, Felipe
Edwards, Erika J.
Dunn, Casey W.
author_facet Howison, Mark
Zapata, Felipe
Edwards, Erika J.
Dunn, Casey W.
author_sort Howison, Mark
collection PubMed
description Most genome assemblers construct point estimates, choosing only a single genome sequence from among many alternative hypotheses that are supported by the data. We present a Markov chain Monte Carlo approach to sequence assembly that instead generates distributions of assembly hypotheses with posterior probabilities, providing an explicit statistical framework for evaluating alternative hypotheses and assessing assembly uncertainty. We implement this approach in a prototype assembler, called Genome Assembly by Bayesian Inference (GABI), and illustrate its application to the bacteriophage [Image: see text]X174. Our sampling strategy achieves both good mixing and convergence on Illumina test data for [Image: see text]X174, demonstrating the feasibility of our approach. We summarize the posterior distribution of assembly hypotheses generated by GABI as a majority-rule consensus assembly. Then we compare the posterior distribution to external assemblies of the same test data, and annotate those assemblies by assigning posterior probabilities to features that are in common with GABI’s assembly graph. GABI is freely available under a GPL license from https://bitbucket.org/mhowison/gabi.
format Online
Article
Text
id pubmed-4072599
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-40725992014-07-02 Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling Howison, Mark Zapata, Felipe Edwards, Erika J. Dunn, Casey W. PLoS One Research Article Most genome assemblers construct point estimates, choosing only a single genome sequence from among many alternative hypotheses that are supported by the data. We present a Markov chain Monte Carlo approach to sequence assembly that instead generates distributions of assembly hypotheses with posterior probabilities, providing an explicit statistical framework for evaluating alternative hypotheses and assessing assembly uncertainty. We implement this approach in a prototype assembler, called Genome Assembly by Bayesian Inference (GABI), and illustrate its application to the bacteriophage [Image: see text]X174. Our sampling strategy achieves both good mixing and convergence on Illumina test data for [Image: see text]X174, demonstrating the feasibility of our approach. We summarize the posterior distribution of assembly hypotheses generated by GABI as a majority-rule consensus assembly. Then we compare the posterior distribution to external assemblies of the same test data, and annotate those assemblies by assigning posterior probabilities to features that are in common with GABI’s assembly graph. GABI is freely available under a GPL license from https://bitbucket.org/mhowison/gabi. Public Library of Science 2014-06-26 /pmc/articles/PMC4072599/ /pubmed/24968249 http://dx.doi.org/10.1371/journal.pone.0099497 Text en © 2014 Howison et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Howison, Mark
Zapata, Felipe
Edwards, Erika J.
Dunn, Casey W.
Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling
title Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling
title_full Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling
title_fullStr Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling
title_full_unstemmed Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling
title_short Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling
title_sort bayesian genome assembly and assessment by markov chain monte carlo sampling
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4072599/
https://www.ncbi.nlm.nih.gov/pubmed/24968249
http://dx.doi.org/10.1371/journal.pone.0099497
work_keys_str_mv AT howisonmark bayesiangenomeassemblyandassessmentbymarkovchainmontecarlosampling
AT zapatafelipe bayesiangenomeassemblyandassessmentbymarkovchainmontecarlosampling
AT edwardserikaj bayesiangenomeassemblyandassessmentbymarkovchainmontecarlosampling
AT dunncaseyw bayesiangenomeassemblyandassessmentbymarkovchainmontecarlosampling