Cargando…

progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement

BACKGROUND: Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. METHODOLOGY/PRINCIPAL FINDINGS: We describe a new method to align...

Descripción completa

Detalles Bibliográficos
Autores principales: Darling, Aaron E., Mau, Bob, Perna, Nicole T.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892488/
https://www.ncbi.nlm.nih.gov/pubmed/20593022
http://dx.doi.org/10.1371/journal.pone.0011147
_version_ 1782182954131259392
author Darling, Aaron E.
Mau, Bob
Perna, Nicole T.
author_facet Darling, Aaron E.
Mau, Bob
Perna, Nicole T.
author_sort Darling, Aaron E.
collection PubMed
description BACKGROUND: Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. METHODOLOGY/PRINCIPAL FINDINGS: We describe a new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss (flux). We demonstrate that the new method can accurately align regions conserved in some, but not all, of the genomes, an important case not handled by our previous work. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score, which facilitates accurate detection of rearrangement breakpoints when genomes have unequal gene content. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The new genome alignment algorithm demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. We apply the new algorithm to a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows us to extend the previously defined concepts of core- and pan-genomes to include not only annotated genes, but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. We document substantial population-level variability among these organisms driven by segmental gain and loss. Interestingly, much variability lies in intergenic regions, suggesting that the Enterobacteriacae may exhibit regulatory divergence. CONCLUSIONS: The multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve.
format Text
id pubmed-2892488
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-28924882010-06-30 progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement Darling, Aaron E. Mau, Bob Perna, Nicole T. PLoS One Research Article BACKGROUND: Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. METHODOLOGY/PRINCIPAL FINDINGS: We describe a new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss (flux). We demonstrate that the new method can accurately align regions conserved in some, but not all, of the genomes, an important case not handled by our previous work. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score, which facilitates accurate detection of rearrangement breakpoints when genomes have unequal gene content. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The new genome alignment algorithm demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. We apply the new algorithm to a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows us to extend the previously defined concepts of core- and pan-genomes to include not only annotated genes, but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. We document substantial population-level variability among these organisms driven by segmental gain and loss. Interestingly, much variability lies in intergenic regions, suggesting that the Enterobacteriacae may exhibit regulatory divergence. CONCLUSIONS: The multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve. Public Library of Science 2010-06-25 /pmc/articles/PMC2892488/ /pubmed/20593022 http://dx.doi.org/10.1371/journal.pone.0011147 Text en Darling et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Darling, Aaron E.
Mau, Bob
Perna, Nicole T.
progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement
title progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement
title_full progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement
title_fullStr progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement
title_full_unstemmed progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement
title_short progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement
title_sort progressivemauve: multiple genome alignment with gene gain, loss and rearrangement
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892488/
https://www.ncbi.nlm.nih.gov/pubmed/20593022
http://dx.doi.org/10.1371/journal.pone.0011147
work_keys_str_mv AT darlingaarone progressivemauvemultiplegenomealignmentwithgenegainlossandrearrangement
AT maubob progressivemauvemultiplegenomealignmentwithgenegainlossandrearrangement
AT pernanicolet progressivemauvemultiplegenomealignmentwithgenegainlossandrearrangement