Cargando…

Variant Callers for Next-Generation Sequencing Data: A Comparison Study

Next generation sequencing (NGS) has been leading the genetic study of human disease into an era of unprecedented productivity. Many bioinformatics pipelines have been developed to call variants from NGS data. The performance of these pipelines depends crucially on the variant caller used and on the...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Xiangtao, Han, Shizhong, Wang, Zuoheng, Gelernter, Joel, Yang, Bao-Zhu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3785481/
https://www.ncbi.nlm.nih.gov/pubmed/24086590
http://dx.doi.org/10.1371/journal.pone.0075619
_version_ 1782477668497752064
author Liu, Xiangtao
Han, Shizhong
Wang, Zuoheng
Gelernter, Joel
Yang, Bao-Zhu
author_facet Liu, Xiangtao
Han, Shizhong
Wang, Zuoheng
Gelernter, Joel
Yang, Bao-Zhu
author_sort Liu, Xiangtao
collection PubMed
description Next generation sequencing (NGS) has been leading the genetic study of human disease into an era of unprecedented productivity. Many bioinformatics pipelines have been developed to call variants from NGS data. The performance of these pipelines depends crucially on the variant caller used and on the calling strategies implemented. We studied the performance of four prevailing callers, SAMtools, GATK, glftools and Atlas2, using single-sample and multiple-sample variant-calling strategies. Using the same aligner, BWA, we built four single-sample and three multiple-sample calling pipelines and applied the pipelines to whole exome sequencing data taken from 20 individuals. We obtained genotypes generated by Illumina Infinium HumanExome v1.1 Beadchip for validation analysis and then used Sanger sequencing as a “gold-standard” method to resolve discrepancies for selected regions of high discordance. Finally, we compared the sensitivity of three of the single-sample calling pipelines using known simulated whole genome sequence data as a gold standard. Overall, for single-sample calling, the called variants were highly consistent across callers and the pairwise overlapping rate was about 0.9. Compared with other callers, GATK had the highest rediscovery rate (0.9969) and specificity (0.99996), and the Ti/Tv ratio out of GATK was closest to the expected value of 3.02. Multiple-sample calling increased the sensitivity. Results from the simulated data suggested that GATK outperformed SAMtools and glfSingle in sensitivity, especially for low coverage data. Further, for the selected discrepant regions evaluated by Sanger sequencing, variant genotypes called by exome sequencing versus the exome array were more accurate, although the average variant sensitivity and overall genotype consistency rate were as high as 95.87% and 99.82%, respectively. In conclusion, GATK showed several advantages over other variant callers for general purpose NGS analyses. The GATK pipelines we developed perform very well.
format Online
Article
Text
id pubmed-3785481
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37854812013-10-01 Variant Callers for Next-Generation Sequencing Data: A Comparison Study Liu, Xiangtao Han, Shizhong Wang, Zuoheng Gelernter, Joel Yang, Bao-Zhu PLoS One Research Article Next generation sequencing (NGS) has been leading the genetic study of human disease into an era of unprecedented productivity. Many bioinformatics pipelines have been developed to call variants from NGS data. The performance of these pipelines depends crucially on the variant caller used and on the calling strategies implemented. We studied the performance of four prevailing callers, SAMtools, GATK, glftools and Atlas2, using single-sample and multiple-sample variant-calling strategies. Using the same aligner, BWA, we built four single-sample and three multiple-sample calling pipelines and applied the pipelines to whole exome sequencing data taken from 20 individuals. We obtained genotypes generated by Illumina Infinium HumanExome v1.1 Beadchip for validation analysis and then used Sanger sequencing as a “gold-standard” method to resolve discrepancies for selected regions of high discordance. Finally, we compared the sensitivity of three of the single-sample calling pipelines using known simulated whole genome sequence data as a gold standard. Overall, for single-sample calling, the called variants were highly consistent across callers and the pairwise overlapping rate was about 0.9. Compared with other callers, GATK had the highest rediscovery rate (0.9969) and specificity (0.99996), and the Ti/Tv ratio out of GATK was closest to the expected value of 3.02. Multiple-sample calling increased the sensitivity. Results from the simulated data suggested that GATK outperformed SAMtools and glfSingle in sensitivity, especially for low coverage data. Further, for the selected discrepant regions evaluated by Sanger sequencing, variant genotypes called by exome sequencing versus the exome array were more accurate, although the average variant sensitivity and overall genotype consistency rate were as high as 95.87% and 99.82%, respectively. In conclusion, GATK showed several advantages over other variant callers for general purpose NGS analyses. The GATK pipelines we developed perform very well. Public Library of Science 2013-09-27 /pmc/articles/PMC3785481/ /pubmed/24086590 http://dx.doi.org/10.1371/journal.pone.0075619 Text en © 2013 Liu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Liu, Xiangtao
Han, Shizhong
Wang, Zuoheng
Gelernter, Joel
Yang, Bao-Zhu
Variant Callers for Next-Generation Sequencing Data: A Comparison Study
title Variant Callers for Next-Generation Sequencing Data: A Comparison Study
title_full Variant Callers for Next-Generation Sequencing Data: A Comparison Study
title_fullStr Variant Callers for Next-Generation Sequencing Data: A Comparison Study
title_full_unstemmed Variant Callers for Next-Generation Sequencing Data: A Comparison Study
title_short Variant Callers for Next-Generation Sequencing Data: A Comparison Study
title_sort variant callers for next-generation sequencing data: a comparison study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3785481/
https://www.ncbi.nlm.nih.gov/pubmed/24086590
http://dx.doi.org/10.1371/journal.pone.0075619
work_keys_str_mv AT liuxiangtao variantcallersfornextgenerationsequencingdataacomparisonstudy
AT hanshizhong variantcallersfornextgenerationsequencingdataacomparisonstudy
AT wangzuoheng variantcallersfornextgenerationsequencingdataacomparisonstudy
AT gelernterjoel variantcallersfornextgenerationsequencingdataacomparisonstudy
AT yangbaozhu variantcallersfornextgenerationsequencingdataacomparisonstudy