Cargando…

Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data

BACKGROUND: Whole exome sequencing (WES) is a cost-effective method that identifies clinical variants but it demands accurate variant caller tools. Currently available tools have variable accuracy in predicting specific clinical variants. But it may be possible to find the best combination of aligne...

Descripción completa

Detalles Bibliográficos
Autores principales: Kumaran, Manojkumar, Subramanian, Umadevi, Devarajan, Bharanidharan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6580603/
https://www.ncbi.nlm.nih.gov/pubmed/31208315
http://dx.doi.org/10.1186/s12859-019-2928-9
_version_ 1783428053009956864
author Kumaran, Manojkumar
Subramanian, Umadevi
Devarajan, Bharanidharan
author_facet Kumaran, Manojkumar
Subramanian, Umadevi
Devarajan, Bharanidharan
author_sort Kumaran, Manojkumar
collection PubMed
description BACKGROUND: Whole exome sequencing (WES) is a cost-effective method that identifies clinical variants but it demands accurate variant caller tools. Currently available tools have variable accuracy in predicting specific clinical variants. But it may be possible to find the best combination of aligner-variant caller tools for detecting accurate single nucleotide variants (SNVs) and small insertion and deletion (InDels) separately. Moreover, many important aspects of InDel detection are overlooked while comparing the performance of tools, particularly its base pair length. RESULTS: We assessed the performance of variant calling pipelines using the combinations of four variant callers and five aligners on human NA12878 and simulated exome data. We used high confidence variant calls from Genome in a Bottle (GiaB) consortium for validation, and GRCh37 and GRCh38 as the human reference genome. Based on the performance metrics, both BWA and Novoalign aligners performed better with DeepVariant and SAMtools callers for detecting SNVs, and with DeepVariant and GATK for InDels. Furthermore, we obtained similar results on human NA24385 and NA24631 exome data from GiaB. CONCLUSION: In this study, DeepVariant with BWA and Novoalign performed best for detecting accurate SNVs and InDels. The accuracy of variant calling was improved by merging the top performing pipelines. The results of our study provide useful recommendations for analysis of WES data in clinical genomics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2928-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6580603
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65806032019-06-24 Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data Kumaran, Manojkumar Subramanian, Umadevi Devarajan, Bharanidharan BMC Bioinformatics Research Article BACKGROUND: Whole exome sequencing (WES) is a cost-effective method that identifies clinical variants but it demands accurate variant caller tools. Currently available tools have variable accuracy in predicting specific clinical variants. But it may be possible to find the best combination of aligner-variant caller tools for detecting accurate single nucleotide variants (SNVs) and small insertion and deletion (InDels) separately. Moreover, many important aspects of InDel detection are overlooked while comparing the performance of tools, particularly its base pair length. RESULTS: We assessed the performance of variant calling pipelines using the combinations of four variant callers and five aligners on human NA12878 and simulated exome data. We used high confidence variant calls from Genome in a Bottle (GiaB) consortium for validation, and GRCh37 and GRCh38 as the human reference genome. Based on the performance metrics, both BWA and Novoalign aligners performed better with DeepVariant and SAMtools callers for detecting SNVs, and with DeepVariant and GATK for InDels. Furthermore, we obtained similar results on human NA24385 and NA24631 exome data from GiaB. CONCLUSION: In this study, DeepVariant with BWA and Novoalign performed best for detecting accurate SNVs and InDels. The accuracy of variant calling was improved by merging the top performing pipelines. The results of our study provide useful recommendations for analysis of WES data in clinical genomics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2928-9) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-17 /pmc/articles/PMC6580603/ /pubmed/31208315 http://dx.doi.org/10.1186/s12859-019-2928-9 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Kumaran, Manojkumar
Subramanian, Umadevi
Devarajan, Bharanidharan
Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
title Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
title_full Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
title_fullStr Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
title_full_unstemmed Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
title_short Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
title_sort performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6580603/
https://www.ncbi.nlm.nih.gov/pubmed/31208315
http://dx.doi.org/10.1186/s12859-019-2928-9
work_keys_str_mv AT kumaranmanojkumar performanceassessmentofvariantcallingpipelinesusinghumanwholeexomesequencingandsimulateddata
AT subramanianumadevi performanceassessmentofvariantcallingpipelinesusinghumanwholeexomesequencingandsimulateddata
AT devarajanbharanidharan performanceassessmentofvariantcallingpipelinesusinghumanwholeexomesequencingandsimulateddata