Cargando…
Estimating genotype error rates from high-coverage next-generation sequence data
Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to esti...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4216915/ https://www.ncbi.nlm.nih.gov/pubmed/25304867 http://dx.doi.org/10.1101/gr.168393.113 |
_version_ | 1782342327228956672 |
---|---|
author | Wall, Jeffrey D. Tang, Ling Fung Zerbe, Brandon Kvale, Mark N. Kwok, Pui-Yan Schaefer, Catherine Risch, Neil |
author_facet | Wall, Jeffrey D. Tang, Ling Fung Zerbe, Brandon Kvale, Mark N. Kwok, Pui-Yan Schaefer, Catherine Risch, Neil |
author_sort | Wall, Jeffrey D. |
collection | PubMed |
description | Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)–(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods. |
format | Online Article Text |
id | pubmed-4216915 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-42169152015-05-01 Estimating genotype error rates from high-coverage next-generation sequence data Wall, Jeffrey D. Tang, Ling Fung Zerbe, Brandon Kvale, Mark N. Kwok, Pui-Yan Schaefer, Catherine Risch, Neil Genome Res Research Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)–(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods. Cold Spring Harbor Laboratory Press 2014-11 /pmc/articles/PMC4216915/ /pubmed/25304867 http://dx.doi.org/10.1101/gr.168393.113 Text en © 2014 Wall et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/. |
spellingShingle | Research Wall, Jeffrey D. Tang, Ling Fung Zerbe, Brandon Kvale, Mark N. Kwok, Pui-Yan Schaefer, Catherine Risch, Neil Estimating genotype error rates from high-coverage next-generation sequence data |
title | Estimating genotype error rates from high-coverage next-generation sequence data |
title_full | Estimating genotype error rates from high-coverage next-generation sequence data |
title_fullStr | Estimating genotype error rates from high-coverage next-generation sequence data |
title_full_unstemmed | Estimating genotype error rates from high-coverage next-generation sequence data |
title_short | Estimating genotype error rates from high-coverage next-generation sequence data |
title_sort | estimating genotype error rates from high-coverage next-generation sequence data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4216915/ https://www.ncbi.nlm.nih.gov/pubmed/25304867 http://dx.doi.org/10.1101/gr.168393.113 |
work_keys_str_mv | AT walljeffreyd estimatinggenotypeerrorratesfromhighcoveragenextgenerationsequencedata AT tanglingfung estimatinggenotypeerrorratesfromhighcoveragenextgenerationsequencedata AT zerbebrandon estimatinggenotypeerrorratesfromhighcoveragenextgenerationsequencedata AT kvalemarkn estimatinggenotypeerrorratesfromhighcoveragenextgenerationsequencedata AT kwokpuiyan estimatinggenotypeerrorratesfromhighcoveragenextgenerationsequencedata AT schaefercatherine estimatinggenotypeerrorratesfromhighcoveragenextgenerationsequencedata AT rischneil estimatinggenotypeerrorratesfromhighcoveragenextgenerationsequencedata |