Cargando…

Reproducibility of Variant Calls in Replicate Next Generation Sequencing Experiments

Nucleotide alterations detected by next generation sequencing are not always true biological changes but could represent sequencing errors. Even highly accurate methods can yield substantial error rates when applied to millions of nucleotides. In this study, we examined the reproducibility of nucleo...

Descripción completa

Detalles Bibliográficos
Autores principales: Qi, Yuan, Liu, Xiuping, Liu, Chang-gong, Wang, Bailing, Hess, Kenneth R., Symmans, W. Fraser, Shi, Weiwei, Pusztai, Lajos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4489803/
https://www.ncbi.nlm.nih.gov/pubmed/26136146
http://dx.doi.org/10.1371/journal.pone.0119230
_version_ 1782379421582229504
author Qi, Yuan
Liu, Xiuping
Liu, Chang-gong
Wang, Bailing
Hess, Kenneth R.
Symmans, W. Fraser
Shi, Weiwei
Pusztai, Lajos
author_facet Qi, Yuan
Liu, Xiuping
Liu, Chang-gong
Wang, Bailing
Hess, Kenneth R.
Symmans, W. Fraser
Shi, Weiwei
Pusztai, Lajos
author_sort Qi, Yuan
collection PubMed
description Nucleotide alterations detected by next generation sequencing are not always true biological changes but could represent sequencing errors. Even highly accurate methods can yield substantial error rates when applied to millions of nucleotides. In this study, we examined the reproducibility of nucleotide variant calls in replicate sequencing experiments of the same genomic DNA. We performed targeted sequencing of all known human protein kinase genes (kinome) (~3.2 Mb) using the SOLiD v4 platform. Seventeen breast cancer samples were sequenced in duplicate (n=14) or triplicate (n=3) to assess concordance of all calls and single nucleotide variant (SNV) calls. The concordance rates over the entire sequenced region were >99.99%, while the concordance rates for SNVs were 54.3-75.5%. There was substantial variation in basic sequencing metrics from experiment to experiment. The type of nucleotide substitution and genomic location of the variant had little impact on concordance but concordance increased with coverage level, variant allele count (VAC), variant allele frequency (VAF), variant allele quality and p-value of SNV-call. The most important determinants of concordance were VAC and VAF. Even using the highest stringency of QC metrics the reproducibility of SNV calls was around 80% suggesting that erroneous variant calling can be as high as 20-40% in a single experiment. The sequence data have been deposited into the European Genome-phenome Archive (EGA) with accession number EGAS00001000826.
format Online
Article
Text
id pubmed-4489803
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44898032015-07-15 Reproducibility of Variant Calls in Replicate Next Generation Sequencing Experiments Qi, Yuan Liu, Xiuping Liu, Chang-gong Wang, Bailing Hess, Kenneth R. Symmans, W. Fraser Shi, Weiwei Pusztai, Lajos PLoS One Research Article Nucleotide alterations detected by next generation sequencing are not always true biological changes but could represent sequencing errors. Even highly accurate methods can yield substantial error rates when applied to millions of nucleotides. In this study, we examined the reproducibility of nucleotide variant calls in replicate sequencing experiments of the same genomic DNA. We performed targeted sequencing of all known human protein kinase genes (kinome) (~3.2 Mb) using the SOLiD v4 platform. Seventeen breast cancer samples were sequenced in duplicate (n=14) or triplicate (n=3) to assess concordance of all calls and single nucleotide variant (SNV) calls. The concordance rates over the entire sequenced region were >99.99%, while the concordance rates for SNVs were 54.3-75.5%. There was substantial variation in basic sequencing metrics from experiment to experiment. The type of nucleotide substitution and genomic location of the variant had little impact on concordance but concordance increased with coverage level, variant allele count (VAC), variant allele frequency (VAF), variant allele quality and p-value of SNV-call. The most important determinants of concordance were VAC and VAF. Even using the highest stringency of QC metrics the reproducibility of SNV calls was around 80% suggesting that erroneous variant calling can be as high as 20-40% in a single experiment. The sequence data have been deposited into the European Genome-phenome Archive (EGA) with accession number EGAS00001000826. Public Library of Science 2015-07-02 /pmc/articles/PMC4489803/ /pubmed/26136146 http://dx.doi.org/10.1371/journal.pone.0119230 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
Qi, Yuan
Liu, Xiuping
Liu, Chang-gong
Wang, Bailing
Hess, Kenneth R.
Symmans, W. Fraser
Shi, Weiwei
Pusztai, Lajos
Reproducibility of Variant Calls in Replicate Next Generation Sequencing Experiments
title Reproducibility of Variant Calls in Replicate Next Generation Sequencing Experiments
title_full Reproducibility of Variant Calls in Replicate Next Generation Sequencing Experiments
title_fullStr Reproducibility of Variant Calls in Replicate Next Generation Sequencing Experiments
title_full_unstemmed Reproducibility of Variant Calls in Replicate Next Generation Sequencing Experiments
title_short Reproducibility of Variant Calls in Replicate Next Generation Sequencing Experiments
title_sort reproducibility of variant calls in replicate next generation sequencing experiments
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4489803/
https://www.ncbi.nlm.nih.gov/pubmed/26136146
http://dx.doi.org/10.1371/journal.pone.0119230
work_keys_str_mv AT qiyuan reproducibilityofvariantcallsinreplicatenextgenerationsequencingexperiments
AT liuxiuping reproducibilityofvariantcallsinreplicatenextgenerationsequencingexperiments
AT liuchanggong reproducibilityofvariantcallsinreplicatenextgenerationsequencingexperiments
AT wangbailing reproducibilityofvariantcallsinreplicatenextgenerationsequencingexperiments
AT hesskennethr reproducibilityofvariantcallsinreplicatenextgenerationsequencingexperiments
AT symmanswfraser reproducibilityofvariantcallsinreplicatenextgenerationsequencingexperiments
AT shiweiwei reproducibilityofvariantcallsinreplicatenextgenerationsequencingexperiments
AT pusztailajos reproducibilityofvariantcallsinreplicatenextgenerationsequencingexperiments