Cargando…

A statistical method to identify recombination in bacterial genomes based on SNP incompatibility

BACKGROUND: Phylogeny estimation for bacteria is likely to reflect their true evolutionary histories only if they are highly clonal. However, recombination events could occur during evolution for some species. The reconstruction of phylogenetic trees from an alignment without considering recombinati...

Descripción completa

Detalles Bibliográficos
Autores principales: Lai, Yi-Pin, Ioerger, Thomas R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6251179/
https://www.ncbi.nlm.nih.gov/pubmed/30466385
http://dx.doi.org/10.1186/s12859-018-2456-z
_version_ 1783373066188881920
author Lai, Yi-Pin
Ioerger, Thomas R.
author_facet Lai, Yi-Pin
Ioerger, Thomas R.
author_sort Lai, Yi-Pin
collection PubMed
description BACKGROUND: Phylogeny estimation for bacteria is likely to reflect their true evolutionary histories only if they are highly clonal. However, recombination events could occur during evolution for some species. The reconstruction of phylogenetic trees from an alignment without considering recombination could be misleading, since the relationships among strains in some parts of the genome might be different than in others. Using a single, global tree can create the appearance of homoplasy in recombined regions. Hence, the identification of recombination breakpoints is essential to better understand the evolutionary relationships of isolates among a bacterial population. RESULTS: Previously, we have developed a method (called ACR) to detect potential breakpoints in an alignment by evaluating compatibility of polymorphic sites in a sliding window. To assess the statistical significance of candidate breakpoints, we propose an extension of the algorithm (ptACR) that applies a permutation test to generate a null distribution for comparing the average local compatibility. The performance of ptACR is evaluated on both simulated and empirical datasets. ptACR is shown to have similar sensitivity (true positive rate) but a lower false positive rate and higher F1 score compared to basic ACR. When used to analyze a collection of clinical isolates of Staphylococcus aureus, ptACR finds clear evidence of recombination events in this bacterial pathogen, and is able to identify statistically significant boundaries of chromosomal regions with distinct phylogenies. CONCLUSIONS: ptACR is an accurate and efficient method for identifying genomic regions affected by recombination in bacterial genomes.
format Online
Article
Text
id pubmed-6251179
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62511792018-11-29 A statistical method to identify recombination in bacterial genomes based on SNP incompatibility Lai, Yi-Pin Ioerger, Thomas R. BMC Bioinformatics Methodology Article BACKGROUND: Phylogeny estimation for bacteria is likely to reflect their true evolutionary histories only if they are highly clonal. However, recombination events could occur during evolution for some species. The reconstruction of phylogenetic trees from an alignment without considering recombination could be misleading, since the relationships among strains in some parts of the genome might be different than in others. Using a single, global tree can create the appearance of homoplasy in recombined regions. Hence, the identification of recombination breakpoints is essential to better understand the evolutionary relationships of isolates among a bacterial population. RESULTS: Previously, we have developed a method (called ACR) to detect potential breakpoints in an alignment by evaluating compatibility of polymorphic sites in a sliding window. To assess the statistical significance of candidate breakpoints, we propose an extension of the algorithm (ptACR) that applies a permutation test to generate a null distribution for comparing the average local compatibility. The performance of ptACR is evaluated on both simulated and empirical datasets. ptACR is shown to have similar sensitivity (true positive rate) but a lower false positive rate and higher F1 score compared to basic ACR. When used to analyze a collection of clinical isolates of Staphylococcus aureus, ptACR finds clear evidence of recombination events in this bacterial pathogen, and is able to identify statistically significant boundaries of chromosomal regions with distinct phylogenies. CONCLUSIONS: ptACR is an accurate and efficient method for identifying genomic regions affected by recombination in bacterial genomes. BioMed Central 2018-11-22 /pmc/articles/PMC6251179/ /pubmed/30466385 http://dx.doi.org/10.1186/s12859-018-2456-z Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Lai, Yi-Pin
Ioerger, Thomas R.
A statistical method to identify recombination in bacterial genomes based on SNP incompatibility
title A statistical method to identify recombination in bacterial genomes based on SNP incompatibility
title_full A statistical method to identify recombination in bacterial genomes based on SNP incompatibility
title_fullStr A statistical method to identify recombination in bacterial genomes based on SNP incompatibility
title_full_unstemmed A statistical method to identify recombination in bacterial genomes based on SNP incompatibility
title_short A statistical method to identify recombination in bacterial genomes based on SNP incompatibility
title_sort statistical method to identify recombination in bacterial genomes based on snp incompatibility
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6251179/
https://www.ncbi.nlm.nih.gov/pubmed/30466385
http://dx.doi.org/10.1186/s12859-018-2456-z
work_keys_str_mv AT laiyipin astatisticalmethodtoidentifyrecombinationinbacterialgenomesbasedonsnpincompatibility
AT ioergerthomasr astatisticalmethodtoidentifyrecombinationinbacterialgenomesbasedonsnpincompatibility
AT laiyipin statisticalmethodtoidentifyrecombinationinbacterialgenomesbasedonsnpincompatibility
AT ioergerthomasr statisticalmethodtoidentifyrecombinationinbacterialgenomesbasedonsnpincompatibility