Cargando…

Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data

Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regar...

Descripción completa

Detalles Bibliográficos
Autores principales: Sandmann, Sarah, de Graaf, Aniek O., Karimi, Mohsen, van der Reijden, Bert A., Hellström-Lindberg, Eva, Jansen, Joop H., Dugas, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5324109/
https://www.ncbi.nlm.nih.gov/pubmed/28233799
http://dx.doi.org/10.1038/srep43169
_version_ 1782510156386402304
author Sandmann, Sarah
de Graaf, Aniek O.
Karimi, Mohsen
van der Reijden, Bert A.
Hellström-Lindberg, Eva
Jansen, Joop H.
Dugas, Martin
author_facet Sandmann, Sarah
de Graaf, Aniek O.
Karimi, Mohsen
van der Reijden, Bert A.
Hellström-Lindberg, Eva
Jansen, Joop H.
Dugas, Martin
author_sort Sandmann, Sarah
collection PubMed
description Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages- and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading.
format Online
Article
Text
id pubmed-5324109
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-53241092017-03-01 Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data Sandmann, Sarah de Graaf, Aniek O. Karimi, Mohsen van der Reijden, Bert A. Hellström-Lindberg, Eva Jansen, Joop H. Dugas, Martin Sci Rep Article Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages- and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading. Nature Publishing Group 2017-02-24 /pmc/articles/PMC5324109/ /pubmed/28233799 http://dx.doi.org/10.1038/srep43169 Text en Copyright © 2017, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Sandmann, Sarah
de Graaf, Aniek O.
Karimi, Mohsen
van der Reijden, Bert A.
Hellström-Lindberg, Eva
Jansen, Joop H.
Dugas, Martin
Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
title Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
title_full Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
title_fullStr Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
title_full_unstemmed Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
title_short Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
title_sort evaluating variant calling tools for non-matched next-generation sequencing data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5324109/
https://www.ncbi.nlm.nih.gov/pubmed/28233799
http://dx.doi.org/10.1038/srep43169
work_keys_str_mv AT sandmannsarah evaluatingvariantcallingtoolsfornonmatchednextgenerationsequencingdata
AT degraafanieko evaluatingvariantcallingtoolsfornonmatchednextgenerationsequencingdata
AT karimimohsen evaluatingvariantcallingtoolsfornonmatchednextgenerationsequencingdata
AT vanderreijdenberta evaluatingvariantcallingtoolsfornonmatchednextgenerationsequencingdata
AT hellstromlindbergeva evaluatingvariantcallingtoolsfornonmatchednextgenerationsequencingdata
AT jansenjooph evaluatingvariantcallingtoolsfornonmatchednextgenerationsequencingdata
AT dugasmartin evaluatingvariantcallingtoolsfornonmatchednextgenerationsequencingdata