Cargando…

Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data

In the design of whole-genome sequencing (WGS) studies, sequencing depth is a crucial parameter to define variant calling accuracy and study cost, with no standard recommendations having been established. We empirically evaluated the variant calling accuracy of the WGS pipeline using ultra-deep WGS...

Descripción completa

Detalles Bibliográficos
Autores principales: Kishikawa, Toshihiro, Momozawa, Yukihide, Ozeki, Takeshi, Mushiroda, Taisei, Inohara, Hidenori, Kamatani, Yoichiro, Kubo, Michiaki, Okada, Yukinori
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6370902/
https://www.ncbi.nlm.nih.gov/pubmed/30741997
http://dx.doi.org/10.1038/s41598-018-38346-0
_version_ 1783394454395158528
author Kishikawa, Toshihiro
Momozawa, Yukihide
Ozeki, Takeshi
Mushiroda, Taisei
Inohara, Hidenori
Kamatani, Yoichiro
Kubo, Michiaki
Okada, Yukinori
author_facet Kishikawa, Toshihiro
Momozawa, Yukihide
Ozeki, Takeshi
Mushiroda, Taisei
Inohara, Hidenori
Kamatani, Yoichiro
Kubo, Michiaki
Okada, Yukinori
author_sort Kishikawa, Toshihiro
collection PubMed
description In the design of whole-genome sequencing (WGS) studies, sequencing depth is a crucial parameter to define variant calling accuracy and study cost, with no standard recommendations having been established. We empirically evaluated the variant calling accuracy of the WGS pipeline using ultra-deep WGS data (approximately 410×). We randomly sampled sequence reads and constructed a series of simulation WGS datasets with a variety of gradual depths (n = 54; from 0.05× to 410×). Next, we evaluated the genotype concordances of the WGS data with those in the SNP microarray data or the WGS data using all the sequence reads. In addition, we assessed the accuracy of HLA allele genotyping using the WGS data with multiple software tools (PHLAT, HLA-VBseq, HLA-HD, and SNP2HLA). The WGS data with higher depths showed higher concordance rates, and >13.7× depth achieved as high as >99% of concordance. Comparisons with the WGS data using all the sequence reads showed that SNVs achieved >95% of concordance at 17.6× depth, whereas indels showed only 60% concordance. For the accuracy of HLA allele genotyping using the WGS data, 13.7× depth showed sufficient accuracy while performance heterogeneity among the software tools was observed (the highest concordance of 96.9% was observed with HLA-HD). Improvement in HLA genotyping accuracy by further increasing the depths was limited. These results suggest a medium degree of the WGS depth setting (approximately 15×) to achieve both accurate SNV calling and cost-effectiveness, whereas relatively higher depths are required for accurate indel calling.
format Online
Article
Text
id pubmed-6370902
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-63709022019-02-15 Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data Kishikawa, Toshihiro Momozawa, Yukihide Ozeki, Takeshi Mushiroda, Taisei Inohara, Hidenori Kamatani, Yoichiro Kubo, Michiaki Okada, Yukinori Sci Rep Article In the design of whole-genome sequencing (WGS) studies, sequencing depth is a crucial parameter to define variant calling accuracy and study cost, with no standard recommendations having been established. We empirically evaluated the variant calling accuracy of the WGS pipeline using ultra-deep WGS data (approximately 410×). We randomly sampled sequence reads and constructed a series of simulation WGS datasets with a variety of gradual depths (n = 54; from 0.05× to 410×). Next, we evaluated the genotype concordances of the WGS data with those in the SNP microarray data or the WGS data using all the sequence reads. In addition, we assessed the accuracy of HLA allele genotyping using the WGS data with multiple software tools (PHLAT, HLA-VBseq, HLA-HD, and SNP2HLA). The WGS data with higher depths showed higher concordance rates, and >13.7× depth achieved as high as >99% of concordance. Comparisons with the WGS data using all the sequence reads showed that SNVs achieved >95% of concordance at 17.6× depth, whereas indels showed only 60% concordance. For the accuracy of HLA allele genotyping using the WGS data, 13.7× depth showed sufficient accuracy while performance heterogeneity among the software tools was observed (the highest concordance of 96.9% was observed with HLA-HD). Improvement in HLA genotyping accuracy by further increasing the depths was limited. These results suggest a medium degree of the WGS depth setting (approximately 15×) to achieve both accurate SNV calling and cost-effectiveness, whereas relatively higher depths are required for accurate indel calling. Nature Publishing Group UK 2019-02-11 /pmc/articles/PMC6370902/ /pubmed/30741997 http://dx.doi.org/10.1038/s41598-018-38346-0 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Kishikawa, Toshihiro
Momozawa, Yukihide
Ozeki, Takeshi
Mushiroda, Taisei
Inohara, Hidenori
Kamatani, Yoichiro
Kubo, Michiaki
Okada, Yukinori
Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data
title Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data
title_full Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data
title_fullStr Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data
title_full_unstemmed Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data
title_short Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data
title_sort empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6370902/
https://www.ncbi.nlm.nih.gov/pubmed/30741997
http://dx.doi.org/10.1038/s41598-018-38346-0
work_keys_str_mv AT kishikawatoshihiro empiricalevaluationofvariantcallingaccuracyusingultradeepwholegenomesequencingdata
AT momozawayukihide empiricalevaluationofvariantcallingaccuracyusingultradeepwholegenomesequencingdata
AT ozekitakeshi empiricalevaluationofvariantcallingaccuracyusingultradeepwholegenomesequencingdata
AT mushirodataisei empiricalevaluationofvariantcallingaccuracyusingultradeepwholegenomesequencingdata
AT inoharahidenori empiricalevaluationofvariantcallingaccuracyusingultradeepwholegenomesequencingdata
AT kamataniyoichiro empiricalevaluationofvariantcallingaccuracyusingultradeepwholegenomesequencingdata
AT kubomichiaki empiricalevaluationofvariantcallingaccuracyusingultradeepwholegenomesequencingdata
AT okadayukinori empiricalevaluationofvariantcallingaccuracyusingultradeepwholegenomesequencingdata