Cargando…

Efficient Sampling of Parsimonious Inversion Histories with Application to Genome Rearrangement in Yersinia

Inversions are among the most common mutations acting on the order and orientation of genes in a genome, and polynomial-time algorithms exist to obtain a minimal length series of inversions that transform one genome arrangement to another. However, the minimum length series of inversions (the optima...

Descripción completa

Detalles Bibliográficos
Autores principales: Miklós, István, Darling, Aaron E.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2817410/
https://www.ncbi.nlm.nih.gov/pubmed/20333186
http://dx.doi.org/10.1093/gbe/evp015
_version_ 1782177189435801600
author Miklós, István
Darling, Aaron E.
author_facet Miklós, István
Darling, Aaron E.
author_sort Miklós, István
collection PubMed
description Inversions are among the most common mutations acting on the order and orientation of genes in a genome, and polynomial-time algorithms exist to obtain a minimal length series of inversions that transform one genome arrangement to another. However, the minimum length series of inversions (the optimal sorting path) is often not unique as many such optimal sorting paths exist. If we assume that all optimal sorting paths are equally likely, then statistical inference on genome arrangement history must account for all such sorting paths and not just a single estimate. No deterministic polynomial algorithm is known to count the number of optimal sorting paths nor sample from the uniform distribution of optimal sorting paths. Here, we propose a stochastic method that uniformly samples the set of all optimal sorting paths. Our method uses a novel formulation of parallel Markov chain Monte Carlo. In practice, our method can quickly estimate the total number of optimal sorting paths. We introduce a variant of our approach in which short inversions are modeled to be more likely, and we show how the method can be used to estimate the distribution of inversion lengths and breakpoint usage in pathogenic Yersinia pestis. The proposed method has been implemented in a program called “MC4Inversion.” We draw comparison of MC4Inversion to the sampler implemented in BADGER and a previously described importance sampling (IS) technique. We find that on high-divergence data sets, MC4Inversion finds more optimal sorting paths per second than BADGER and the IS technique and simultaneously avoids bias inherent in the IS technique.
format Text
id pubmed-2817410
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28174102010-03-22 Efficient Sampling of Parsimonious Inversion Histories with Application to Genome Rearrangement in Yersinia Miklós, István Darling, Aaron E. Genome Biol Evol Research Articles Inversions are among the most common mutations acting on the order and orientation of genes in a genome, and polynomial-time algorithms exist to obtain a minimal length series of inversions that transform one genome arrangement to another. However, the minimum length series of inversions (the optimal sorting path) is often not unique as many such optimal sorting paths exist. If we assume that all optimal sorting paths are equally likely, then statistical inference on genome arrangement history must account for all such sorting paths and not just a single estimate. No deterministic polynomial algorithm is known to count the number of optimal sorting paths nor sample from the uniform distribution of optimal sorting paths. Here, we propose a stochastic method that uniformly samples the set of all optimal sorting paths. Our method uses a novel formulation of parallel Markov chain Monte Carlo. In practice, our method can quickly estimate the total number of optimal sorting paths. We introduce a variant of our approach in which short inversions are modeled to be more likely, and we show how the method can be used to estimate the distribution of inversion lengths and breakpoint usage in pathogenic Yersinia pestis. The proposed method has been implemented in a program called “MC4Inversion.” We draw comparison of MC4Inversion to the sampler implemented in BADGER and a previously described importance sampling (IS) technique. We find that on high-divergence data sets, MC4Inversion finds more optimal sorting paths per second than BADGER and the IS technique and simultaneously avoids bias inherent in the IS technique. Oxford University Press 2009 2009-06-22 /pmc/articles/PMC2817410/ /pubmed/20333186 http://dx.doi.org/10.1093/gbe/evp015 Text en © The Author(s) 2009. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Miklós, István
Darling, Aaron E.
Efficient Sampling of Parsimonious Inversion Histories with Application to Genome Rearrangement in Yersinia
title Efficient Sampling of Parsimonious Inversion Histories with Application to Genome Rearrangement in Yersinia
title_full Efficient Sampling of Parsimonious Inversion Histories with Application to Genome Rearrangement in Yersinia
title_fullStr Efficient Sampling of Parsimonious Inversion Histories with Application to Genome Rearrangement in Yersinia
title_full_unstemmed Efficient Sampling of Parsimonious Inversion Histories with Application to Genome Rearrangement in Yersinia
title_short Efficient Sampling of Parsimonious Inversion Histories with Application to Genome Rearrangement in Yersinia
title_sort efficient sampling of parsimonious inversion histories with application to genome rearrangement in yersinia
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2817410/
https://www.ncbi.nlm.nih.gov/pubmed/20333186
http://dx.doi.org/10.1093/gbe/evp015
work_keys_str_mv AT miklosistvan efficientsamplingofparsimoniousinversionhistorieswithapplicationtogenomerearrangementinyersinia
AT darlingaarone efficientsamplingofparsimoniousinversionhistorieswithapplicationtogenomerearrangementinyersinia