Cargando…

Estimating genotype error rates from high-coverage next-generation sequence data

Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to esti...

Descripción completa

Detalles Bibliográficos
Autores principales: Wall, Jeffrey D., Tang, Ling Fung, Zerbe, Brandon, Kvale, Mark N., Kwok, Pui-Yan, Schaefer, Catherine, Risch, Neil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4216915/
https://www.ncbi.nlm.nih.gov/pubmed/25304867
http://dx.doi.org/10.1101/gr.168393.113
_version_ 1782342327228956672
author Wall, Jeffrey D.
Tang, Ling Fung
Zerbe, Brandon
Kvale, Mark N.
Kwok, Pui-Yan
Schaefer, Catherine
Risch, Neil
author_facet Wall, Jeffrey D.
Tang, Ling Fung
Zerbe, Brandon
Kvale, Mark N.
Kwok, Pui-Yan
Schaefer, Catherine
Risch, Neil
author_sort Wall, Jeffrey D.
collection PubMed
description Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)–(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods.
format Online
Article
Text
id pubmed-4216915
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-42169152015-05-01 Estimating genotype error rates from high-coverage next-generation sequence data Wall, Jeffrey D. Tang, Ling Fung Zerbe, Brandon Kvale, Mark N. Kwok, Pui-Yan Schaefer, Catherine Risch, Neil Genome Res Research Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)–(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods. Cold Spring Harbor Laboratory Press 2014-11 /pmc/articles/PMC4216915/ /pubmed/25304867 http://dx.doi.org/10.1101/gr.168393.113 Text en © 2014 Wall et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Research
Wall, Jeffrey D.
Tang, Ling Fung
Zerbe, Brandon
Kvale, Mark N.
Kwok, Pui-Yan
Schaefer, Catherine
Risch, Neil
Estimating genotype error rates from high-coverage next-generation sequence data
title Estimating genotype error rates from high-coverage next-generation sequence data
title_full Estimating genotype error rates from high-coverage next-generation sequence data
title_fullStr Estimating genotype error rates from high-coverage next-generation sequence data
title_full_unstemmed Estimating genotype error rates from high-coverage next-generation sequence data
title_short Estimating genotype error rates from high-coverage next-generation sequence data
title_sort estimating genotype error rates from high-coverage next-generation sequence data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4216915/
https://www.ncbi.nlm.nih.gov/pubmed/25304867
http://dx.doi.org/10.1101/gr.168393.113
work_keys_str_mv AT walljeffreyd estimatinggenotypeerrorratesfromhighcoveragenextgenerationsequencedata
AT tanglingfung estimatinggenotypeerrorratesfromhighcoveragenextgenerationsequencedata
AT zerbebrandon estimatinggenotypeerrorratesfromhighcoveragenextgenerationsequencedata
AT kvalemarkn estimatinggenotypeerrorratesfromhighcoveragenextgenerationsequencedata
AT kwokpuiyan estimatinggenotypeerrorratesfromhighcoveragenextgenerationsequencedata
AT schaefercatherine estimatinggenotypeerrorratesfromhighcoveragenextgenerationsequencedata
AT rischneil estimatinggenotypeerrorratesfromhighcoveragenextgenerationsequencedata