Cargando…

KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes

BACKGROUND: High-throughput DNA sequencing produces vast amounts of data, with millions of short reads that usually have to be mapped to a reference genome or newly assembled. Both reference-based mapping and de novo assembly are computationally intensive, generating large intermediary data files, a...

Descripción completa

Detalles Bibliográficos
Autores principales: Steiner, Andreas, Stucki, David, Coscolla, Mireia, Borrell, Sonia, Gagneux, Sebastien
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4197298/
https://www.ncbi.nlm.nih.gov/pubmed/25297886
http://dx.doi.org/10.1186/1471-2164-15-881
_version_ 1782339598511243264
author Steiner, Andreas
Stucki, David
Coscolla, Mireia
Borrell, Sonia
Gagneux, Sebastien
author_facet Steiner, Andreas
Stucki, David
Coscolla, Mireia
Borrell, Sonia
Gagneux, Sebastien
author_sort Steiner, Andreas
collection PubMed
description BACKGROUND: High-throughput DNA sequencing produces vast amounts of data, with millions of short reads that usually have to be mapped to a reference genome or newly assembled. Both reference-based mapping and de novo assembly are computationally intensive, generating large intermediary data files, and thus require bioinformatics skills that are often lacking in the laboratories producing the data. Moreover, many research and practical applications in microbiology require only a small fraction of the whole genome data. RESULTS: We developed KvarQ, a new tool that directly scans fastq files of bacterial genome sequences for known variants, such as single nucleotide polymorphisms (SNP), bypassing the need of mapping all sequencing reads to a reference genome and de novo assembly. Instead, KvarQ loads “testsuites” that define specific SNPs or short regions of interest in a reference genome, and directly synthesizes the relevant results based on the occurrence of these markers in the fastq files. KvarQ has a versatile command line interface and a graphical user interface. KvarQ currently ships with two “testsuites” for Mycobacterium tuberculosis, but new “testsuites” for other organisms can easily be created and distributed. In this article, we demonstrate how KvarQ can be used to successfully detect all main drug resistance mutations and phylogenetic markers in 880 bacterial whole genome sequences. The average scanning time per genome sequence was two minutes. The variant calls of a subset of these genomes were validated with a standard bioinformatics pipeline and revealed >99% congruency. CONCLUSION: KvarQ is a user-friendly tool that directly extracts relevant information from fastq files. This enables researchers and laboratory technicians with limited bioinformatics expertise to scan and analyze raw sequencing data in a matter of minutes. KvarQ is open-source, and pre-compiled packages with a graphical user interface are available at http://www.swisstph.ch/kvarq. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-881) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4197298
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41972982014-10-16 KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes Steiner, Andreas Stucki, David Coscolla, Mireia Borrell, Sonia Gagneux, Sebastien BMC Genomics Software BACKGROUND: High-throughput DNA sequencing produces vast amounts of data, with millions of short reads that usually have to be mapped to a reference genome or newly assembled. Both reference-based mapping and de novo assembly are computationally intensive, generating large intermediary data files, and thus require bioinformatics skills that are often lacking in the laboratories producing the data. Moreover, many research and practical applications in microbiology require only a small fraction of the whole genome data. RESULTS: We developed KvarQ, a new tool that directly scans fastq files of bacterial genome sequences for known variants, such as single nucleotide polymorphisms (SNP), bypassing the need of mapping all sequencing reads to a reference genome and de novo assembly. Instead, KvarQ loads “testsuites” that define specific SNPs or short regions of interest in a reference genome, and directly synthesizes the relevant results based on the occurrence of these markers in the fastq files. KvarQ has a versatile command line interface and a graphical user interface. KvarQ currently ships with two “testsuites” for Mycobacterium tuberculosis, but new “testsuites” for other organisms can easily be created and distributed. In this article, we demonstrate how KvarQ can be used to successfully detect all main drug resistance mutations and phylogenetic markers in 880 bacterial whole genome sequences. The average scanning time per genome sequence was two minutes. The variant calls of a subset of these genomes were validated with a standard bioinformatics pipeline and revealed >99% congruency. CONCLUSION: KvarQ is a user-friendly tool that directly extracts relevant information from fastq files. This enables researchers and laboratory technicians with limited bioinformatics expertise to scan and analyze raw sequencing data in a matter of minutes. KvarQ is open-source, and pre-compiled packages with a graphical user interface are available at http://www.swisstph.ch/kvarq. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-881) contains supplementary material, which is available to authorized users. BioMed Central 2014-10-09 /pmc/articles/PMC4197298/ /pubmed/25297886 http://dx.doi.org/10.1186/1471-2164-15-881 Text en © Steiner et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Steiner, Andreas
Stucki, David
Coscolla, Mireia
Borrell, Sonia
Gagneux, Sebastien
KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes
title KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes
title_full KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes
title_fullStr KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes
title_full_unstemmed KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes
title_short KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes
title_sort kvarq: targeted and direct variant calling from fastq reads of bacterial genomes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4197298/
https://www.ncbi.nlm.nih.gov/pubmed/25297886
http://dx.doi.org/10.1186/1471-2164-15-881
work_keys_str_mv AT steinerandreas kvarqtargetedanddirectvariantcallingfromfastqreadsofbacterialgenomes
AT stuckidavid kvarqtargetedanddirectvariantcallingfromfastqreadsofbacterialgenomes
AT coscollamireia kvarqtargetedanddirectvariantcallingfromfastqreadsofbacterialgenomes
AT borrellsonia kvarqtargetedanddirectvariantcallingfromfastqreadsofbacterialgenomes
AT gagneuxsebastien kvarqtargetedanddirectvariantcallingfromfastqreadsofbacterialgenomes