Cargando…

Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq

BACKGROUND: Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobil...

Descripción completa

Detalles Bibliográficos
Autores principales: Barrick, Jeffrey E, Colburn, Geoffrey, Deatherage, Daniel E, Traverse, Charles C, Strand, Matthew D, Borges, Jordan J, Knoester, David B, Reba, Aaron, Meyer, Austin G
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4300727/
https://www.ncbi.nlm.nih.gov/pubmed/25432719
http://dx.doi.org/10.1186/1471-2164-15-1039
_version_ 1782353555528613888
author Barrick, Jeffrey E
Colburn, Geoffrey
Deatherage, Daniel E
Traverse, Charles C
Strand, Matthew D
Borges, Jordan J
Knoester, David B
Reba, Aaron
Meyer, Austin G
author_facet Barrick, Jeffrey E
Colburn, Geoffrey
Deatherage, Daniel E
Traverse, Charles C
Strand, Matthew D
Borges, Jordan J
Knoester, David B
Reba, Aaron
Meyer, Austin G
author_sort Barrick, Jeffrey E
collection PubMed
description BACKGROUND: Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. RESULTS: We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). CONCLUSIONS: Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-1039) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4300727
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43007272015-01-22 Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq Barrick, Jeffrey E Colburn, Geoffrey Deatherage, Daniel E Traverse, Charles C Strand, Matthew D Borges, Jordan J Knoester, David B Reba, Aaron Meyer, Austin G BMC Genomics Software BACKGROUND: Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. RESULTS: We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). CONCLUSIONS: Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-1039) contains supplementary material, which is available to authorized users. BioMed Central 2014-11-29 /pmc/articles/PMC4300727/ /pubmed/25432719 http://dx.doi.org/10.1186/1471-2164-15-1039 Text en © Barrick et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Barrick, Jeffrey E
Colburn, Geoffrey
Deatherage, Daniel E
Traverse, Charles C
Strand, Matthew D
Borges, Jordan J
Knoester, David B
Reba, Aaron
Meyer, Austin G
Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq
title Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq
title_full Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq
title_fullStr Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq
title_full_unstemmed Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq
title_short Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq
title_sort identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4300727/
https://www.ncbi.nlm.nih.gov/pubmed/25432719
http://dx.doi.org/10.1186/1471-2164-15-1039
work_keys_str_mv AT barrickjeffreye identifyingstructuralvariationinhaploidmicrobialgenomesfromshortreadresequencingdatausingbreseq
AT colburngeoffrey identifyingstructuralvariationinhaploidmicrobialgenomesfromshortreadresequencingdatausingbreseq
AT deatheragedaniele identifyingstructuralvariationinhaploidmicrobialgenomesfromshortreadresequencingdatausingbreseq
AT traversecharlesc identifyingstructuralvariationinhaploidmicrobialgenomesfromshortreadresequencingdatausingbreseq
AT strandmatthewd identifyingstructuralvariationinhaploidmicrobialgenomesfromshortreadresequencingdatausingbreseq
AT borgesjordanj identifyingstructuralvariationinhaploidmicrobialgenomesfromshortreadresequencingdatausingbreseq
AT knoesterdavidb identifyingstructuralvariationinhaploidmicrobialgenomesfromshortreadresequencingdatausingbreseq
AT rebaaaron identifyingstructuralvariationinhaploidmicrobialgenomesfromshortreadresequencingdatausingbreseq
AT meyerausting identifyingstructuralvariationinhaploidmicrobialgenomesfromshortreadresequencingdatausingbreseq