Cargando…

Medical implications of technical accuracy in genome sequencing

BACKGROUND: As whole exome sequencing (WES) and whole genome sequencing (WGS) transition from research tools to clinical diagnostic tests, it is increasingly critical for sequencing methods and analysis pipelines to be technically accurate. The Genome in a Bottle Consortium has recently published a...

Descripción completa

Detalles Bibliográficos
Autores principales: Goldfeder, Rachel L., Priest, James R., Zook, Justin M., Grove, Megan E., Waggott, Daryl, Wheeler, Matthew T., Salit, Marc, Ashley, Euan A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4774017/
https://www.ncbi.nlm.nih.gov/pubmed/26932475
http://dx.doi.org/10.1186/s13073-016-0269-0
_version_ 1782418840482742272
author Goldfeder, Rachel L.
Priest, James R.
Zook, Justin M.
Grove, Megan E.
Waggott, Daryl
Wheeler, Matthew T.
Salit, Marc
Ashley, Euan A.
author_facet Goldfeder, Rachel L.
Priest, James R.
Zook, Justin M.
Grove, Megan E.
Waggott, Daryl
Wheeler, Matthew T.
Salit, Marc
Ashley, Euan A.
author_sort Goldfeder, Rachel L.
collection PubMed
description BACKGROUND: As whole exome sequencing (WES) and whole genome sequencing (WGS) transition from research tools to clinical diagnostic tests, it is increasingly critical for sequencing methods and analysis pipelines to be technically accurate. The Genome in a Bottle Consortium has recently published a set of benchmark SNV, indel, and homozygous reference genotypes for the pilot whole genome NIST Reference Material based on the NA12878 genome. METHODS: We examine the relationship between human genome complexity and genes/variants reported to be associated with human disease. Specifically, we map regions of medical relevance to benchmark regions of high or low confidence. We use benchmark data to assess the sensitivity and positive predictive value of two representative sequencing pipelines for specific classes of variation. RESULTS: We observe that the accuracy of a variant call depends on the genomic region, variant type, and read depth, and varies by analytical pipeline. We find that most false negative WGS calls result from filtering while most false negative WES variants relate to poor coverage. We find that only 74.6 % of the exonic bases in ClinVar and OMIM genes and 82.1 % of the exonic bases in ACMG-reportable genes are found in high-confidence regions. Only 990 genes in the genome are found entirely within high-confidence regions while 593 of 3,300 ClinVar/OMIM genes have less than 50 % of their total exonic base pairs in high-confidence regions. We find greater than 77 % of the pathogenic or likely pathogenic SNVs currently in ClinVar fall within high-confidence regions. We identify sites that are prone to sequencing errors, including thousands present in publicly available variant databases. Finally, we examine the clinical impact of mandatory reporting of secondary findings, highlighting a false positive variant found in BRCA2. CONCLUSIONS: Together, these data illustrate the importance of appropriate use and continued improvement of technical benchmarks to ensure accurate and judicious interpretation of next-generation DNA sequencing results in the clinical setting. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13073-016-0269-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4774017
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47740172016-03-03 Medical implications of technical accuracy in genome sequencing Goldfeder, Rachel L. Priest, James R. Zook, Justin M. Grove, Megan E. Waggott, Daryl Wheeler, Matthew T. Salit, Marc Ashley, Euan A. Genome Med Research BACKGROUND: As whole exome sequencing (WES) and whole genome sequencing (WGS) transition from research tools to clinical diagnostic tests, it is increasingly critical for sequencing methods and analysis pipelines to be technically accurate. The Genome in a Bottle Consortium has recently published a set of benchmark SNV, indel, and homozygous reference genotypes for the pilot whole genome NIST Reference Material based on the NA12878 genome. METHODS: We examine the relationship between human genome complexity and genes/variants reported to be associated with human disease. Specifically, we map regions of medical relevance to benchmark regions of high or low confidence. We use benchmark data to assess the sensitivity and positive predictive value of two representative sequencing pipelines for specific classes of variation. RESULTS: We observe that the accuracy of a variant call depends on the genomic region, variant type, and read depth, and varies by analytical pipeline. We find that most false negative WGS calls result from filtering while most false negative WES variants relate to poor coverage. We find that only 74.6 % of the exonic bases in ClinVar and OMIM genes and 82.1 % of the exonic bases in ACMG-reportable genes are found in high-confidence regions. Only 990 genes in the genome are found entirely within high-confidence regions while 593 of 3,300 ClinVar/OMIM genes have less than 50 % of their total exonic base pairs in high-confidence regions. We find greater than 77 % of the pathogenic or likely pathogenic SNVs currently in ClinVar fall within high-confidence regions. We identify sites that are prone to sequencing errors, including thousands present in publicly available variant databases. Finally, we examine the clinical impact of mandatory reporting of secondary findings, highlighting a false positive variant found in BRCA2. CONCLUSIONS: Together, these data illustrate the importance of appropriate use and continued improvement of technical benchmarks to ensure accurate and judicious interpretation of next-generation DNA sequencing results in the clinical setting. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13073-016-0269-0) contains supplementary material, which is available to authorized users. BioMed Central 2016-03-02 /pmc/articles/PMC4774017/ /pubmed/26932475 http://dx.doi.org/10.1186/s13073-016-0269-0 Text en © Goldfeder et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Goldfeder, Rachel L.
Priest, James R.
Zook, Justin M.
Grove, Megan E.
Waggott, Daryl
Wheeler, Matthew T.
Salit, Marc
Ashley, Euan A.
Medical implications of technical accuracy in genome sequencing
title Medical implications of technical accuracy in genome sequencing
title_full Medical implications of technical accuracy in genome sequencing
title_fullStr Medical implications of technical accuracy in genome sequencing
title_full_unstemmed Medical implications of technical accuracy in genome sequencing
title_short Medical implications of technical accuracy in genome sequencing
title_sort medical implications of technical accuracy in genome sequencing
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4774017/
https://www.ncbi.nlm.nih.gov/pubmed/26932475
http://dx.doi.org/10.1186/s13073-016-0269-0
work_keys_str_mv AT goldfederrachell medicalimplicationsoftechnicalaccuracyingenomesequencing
AT priestjamesr medicalimplicationsoftechnicalaccuracyingenomesequencing
AT zookjustinm medicalimplicationsoftechnicalaccuracyingenomesequencing
AT grovemegane medicalimplicationsoftechnicalaccuracyingenomesequencing
AT waggottdaryl medicalimplicationsoftechnicalaccuracyingenomesequencing
AT wheelermatthewt medicalimplicationsoftechnicalaccuracyingenomesequencing
AT salitmarc medicalimplicationsoftechnicalaccuracyingenomesequencing
AT ashleyeuana medicalimplicationsoftechnicalaccuracyingenomesequencing