Cargando…

Variant profiling of evolving prokaryotic populations

Genomic heterogeneity of bacterial species is observed and studied in experimental evolution experiments and clinical diagnostics, and occurs as micro-diversity of natural habitats. The challenge for genome research is to accurately capture this heterogeneity with the currently used short sequencing...

Descripción completa

Detalles Bibliográficos
Autores principales: Zojer, Markus, Schuster, Lisa N., Schulz, Frederik, Pfundner, Alexander, Horn, Matthias, Rattei, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5316281/
https://www.ncbi.nlm.nih.gov/pubmed/28224054
http://dx.doi.org/10.7717/peerj.2997
_version_ 1782508822441492480
author Zojer, Markus
Schuster, Lisa N.
Schulz, Frederik
Pfundner, Alexander
Horn, Matthias
Rattei, Thomas
author_facet Zojer, Markus
Schuster, Lisa N.
Schulz, Frederik
Pfundner, Alexander
Horn, Matthias
Rattei, Thomas
author_sort Zojer, Markus
collection PubMed
description Genomic heterogeneity of bacterial species is observed and studied in experimental evolution experiments and clinical diagnostics, and occurs as micro-diversity of natural habitats. The challenge for genome research is to accurately capture this heterogeneity with the currently used short sequencing reads. Recent advances in NGS technologies improved the speed and coverage and thus allowed for deep sequencing of bacterial populations. This facilitates the quantitative assessment of genomic heterogeneity, including low frequency alleles or haplotypes. However, false positive variant predictions due to sequencing errors and mapping artifacts of short reads need to be prevented. We therefore created VarCap, a workflow for the reliable prediction of different types of variants even at low frequencies. In order to predict SNPs, InDels and structural variations, we evaluated the sensitivity and accuracy of different software tools using synthetic read data. The results suggested that the best sensitivity could be reached by a union of different tools, however at the price of increased false positives. We identified possible reasons for false predictions and used this knowledge to improve the accuracy by post-filtering the predicted variants according to properties such as frequency, coverage, genomic environment/localization and co-localization with other variants. We observed that best precision was achieved by using an intersection of at least two tools per variant. This resulted in the reliable prediction of variants above a minimum relative abundance of 2%. VarCap is designed for being routinely used within experimental evolution experiments or for clinical diagnostics. The detected variants are reported as frequencies within a VCF file and as a graphical overview of the distribution of the different variant/allele/haplotype frequencies. The source code of VarCap is available at https://github.com/ma2o/VarCap. In order to provide this workflow to a broad community, we implemeted VarCap on a Galaxy webserver, which is accessible at http://galaxy.csb.univie.ac.at.
format Online
Article
Text
id pubmed-5316281
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-53162812017-02-21 Variant profiling of evolving prokaryotic populations Zojer, Markus Schuster, Lisa N. Schulz, Frederik Pfundner, Alexander Horn, Matthias Rattei, Thomas PeerJ Bioinformatics Genomic heterogeneity of bacterial species is observed and studied in experimental evolution experiments and clinical diagnostics, and occurs as micro-diversity of natural habitats. The challenge for genome research is to accurately capture this heterogeneity with the currently used short sequencing reads. Recent advances in NGS technologies improved the speed and coverage and thus allowed for deep sequencing of bacterial populations. This facilitates the quantitative assessment of genomic heterogeneity, including low frequency alleles or haplotypes. However, false positive variant predictions due to sequencing errors and mapping artifacts of short reads need to be prevented. We therefore created VarCap, a workflow for the reliable prediction of different types of variants even at low frequencies. In order to predict SNPs, InDels and structural variations, we evaluated the sensitivity and accuracy of different software tools using synthetic read data. The results suggested that the best sensitivity could be reached by a union of different tools, however at the price of increased false positives. We identified possible reasons for false predictions and used this knowledge to improve the accuracy by post-filtering the predicted variants according to properties such as frequency, coverage, genomic environment/localization and co-localization with other variants. We observed that best precision was achieved by using an intersection of at least two tools per variant. This resulted in the reliable prediction of variants above a minimum relative abundance of 2%. VarCap is designed for being routinely used within experimental evolution experiments or for clinical diagnostics. The detected variants are reported as frequencies within a VCF file and as a graphical overview of the distribution of the different variant/allele/haplotype frequencies. The source code of VarCap is available at https://github.com/ma2o/VarCap. In order to provide this workflow to a broad community, we implemeted VarCap on a Galaxy webserver, which is accessible at http://galaxy.csb.univie.ac.at. PeerJ Inc. 2017-02-16 /pmc/articles/PMC5316281/ /pubmed/28224054 http://dx.doi.org/10.7717/peerj.2997 Text en ©2017 Zojer et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Zojer, Markus
Schuster, Lisa N.
Schulz, Frederik
Pfundner, Alexander
Horn, Matthias
Rattei, Thomas
Variant profiling of evolving prokaryotic populations
title Variant profiling of evolving prokaryotic populations
title_full Variant profiling of evolving prokaryotic populations
title_fullStr Variant profiling of evolving prokaryotic populations
title_full_unstemmed Variant profiling of evolving prokaryotic populations
title_short Variant profiling of evolving prokaryotic populations
title_sort variant profiling of evolving prokaryotic populations
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5316281/
https://www.ncbi.nlm.nih.gov/pubmed/28224054
http://dx.doi.org/10.7717/peerj.2997
work_keys_str_mv AT zojermarkus variantprofilingofevolvingprokaryoticpopulations
AT schusterlisan variantprofilingofevolvingprokaryoticpopulations
AT schulzfrederik variantprofilingofevolvingprokaryoticpopulations
AT pfundneralexander variantprofilingofevolvingprokaryoticpopulations
AT hornmatthias variantprofilingofevolvingprokaryoticpopulations
AT ratteithomas variantprofilingofevolvingprokaryoticpopulations