Cargando…
Analysis of error profiles in deep next-generation sequencing data
BACKGROUND: Sequencing errors are key confounding factors for detecting low-frequency genetic variants that are important for cancer molecular diagnosis, treatment, and surveillance using deep next-generation sequencing (NGS). However, there is a lack of comprehensive understanding of errors introdu...
Autores principales: | , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417284/ https://www.ncbi.nlm.nih.gov/pubmed/30867008 http://dx.doi.org/10.1186/s13059-019-1659-6 |
_version_ | 1783403539072024576 |
---|---|
author | Ma, Xiaotu Shao, Ying Tian, Liqing Flasch, Diane A. Mulder, Heather L. Edmonson, Michael N. Liu, Yu Chen, Xiang Newman, Scott Nakitandwe, Joy Li, Yongjin Li, Benshang Shen, Shuhong Wang, Zhaoming Shurtleff, Sheila Robison, Leslie L. Levy, Shawn Easton, John Zhang, Jinghui |
author_facet | Ma, Xiaotu Shao, Ying Tian, Liqing Flasch, Diane A. Mulder, Heather L. Edmonson, Michael N. Liu, Yu Chen, Xiang Newman, Scott Nakitandwe, Joy Li, Yongjin Li, Benshang Shen, Shuhong Wang, Zhaoming Shurtleff, Sheila Robison, Leslie L. Levy, Shawn Easton, John Zhang, Jinghui |
author_sort | Ma, Xiaotu |
collection | PubMed |
description | BACKGROUND: Sequencing errors are key confounding factors for detecting low-frequency genetic variants that are important for cancer molecular diagnosis, treatment, and surveillance using deep next-generation sequencing (NGS). However, there is a lack of comprehensive understanding of errors introduced at various steps of a conventional NGS workflow, such as sample handling, library preparation, PCR enrichment, and sequencing. In this study, we use current NGS technology to systematically investigate these questions. RESULTS: By evaluating read-specific error distributions, we discover that the substitution error rate can be computationally suppressed to 10(−5) to 10(−4), which is 10- to 100-fold lower than generally considered achievable (10(−3)) in the current literature. We then quantify substitution errors attributable to sample handling, library preparation, enrichment PCR, and sequencing by using multiple deep sequencing datasets. We find that error rates differ by nucleotide substitution types, ranging from 10(−5) for A>C/T>G, C>A/G>T, and C>G/G>C changes to 10(−4) for A>G/T>C changes. Furthermore, C>T/G>A errors exhibit strong sequence context dependency, sample-specific effects dominate elevated C>A/G>T errors, and target-enrichment PCR led to ~ 6-fold increase of overall error rate. We also find that more than 70% of hotspot variants can be detected at 0.1 ~ 0.01% frequency with the current NGS technology by applying in silico error suppression. CONCLUSIONS: We present the first comprehensive analysis of sequencing error sources in conventional NGS workflows. The error profiles revealed by our study highlight new directions for further improving NGS analysis accuracy both experimentally and computationally, ultimately enhancing the precision of deep sequencing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1659-6) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6417284 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-64172842019-03-25 Analysis of error profiles in deep next-generation sequencing data Ma, Xiaotu Shao, Ying Tian, Liqing Flasch, Diane A. Mulder, Heather L. Edmonson, Michael N. Liu, Yu Chen, Xiang Newman, Scott Nakitandwe, Joy Li, Yongjin Li, Benshang Shen, Shuhong Wang, Zhaoming Shurtleff, Sheila Robison, Leslie L. Levy, Shawn Easton, John Zhang, Jinghui Genome Biol Research BACKGROUND: Sequencing errors are key confounding factors for detecting low-frequency genetic variants that are important for cancer molecular diagnosis, treatment, and surveillance using deep next-generation sequencing (NGS). However, there is a lack of comprehensive understanding of errors introduced at various steps of a conventional NGS workflow, such as sample handling, library preparation, PCR enrichment, and sequencing. In this study, we use current NGS technology to systematically investigate these questions. RESULTS: By evaluating read-specific error distributions, we discover that the substitution error rate can be computationally suppressed to 10(−5) to 10(−4), which is 10- to 100-fold lower than generally considered achievable (10(−3)) in the current literature. We then quantify substitution errors attributable to sample handling, library preparation, enrichment PCR, and sequencing by using multiple deep sequencing datasets. We find that error rates differ by nucleotide substitution types, ranging from 10(−5) for A>C/T>G, C>A/G>T, and C>G/G>C changes to 10(−4) for A>G/T>C changes. Furthermore, C>T/G>A errors exhibit strong sequence context dependency, sample-specific effects dominate elevated C>A/G>T errors, and target-enrichment PCR led to ~ 6-fold increase of overall error rate. We also find that more than 70% of hotspot variants can be detected at 0.1 ~ 0.01% frequency with the current NGS technology by applying in silico error suppression. CONCLUSIONS: We present the first comprehensive analysis of sequencing error sources in conventional NGS workflows. The error profiles revealed by our study highlight new directions for further improving NGS analysis accuracy both experimentally and computationally, ultimately enhancing the precision of deep sequencing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1659-6) contains supplementary material, which is available to authorized users. BioMed Central 2019-03-14 /pmc/articles/PMC6417284/ /pubmed/30867008 http://dx.doi.org/10.1186/s13059-019-1659-6 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Ma, Xiaotu Shao, Ying Tian, Liqing Flasch, Diane A. Mulder, Heather L. Edmonson, Michael N. Liu, Yu Chen, Xiang Newman, Scott Nakitandwe, Joy Li, Yongjin Li, Benshang Shen, Shuhong Wang, Zhaoming Shurtleff, Sheila Robison, Leslie L. Levy, Shawn Easton, John Zhang, Jinghui Analysis of error profiles in deep next-generation sequencing data |
title | Analysis of error profiles in deep next-generation sequencing data |
title_full | Analysis of error profiles in deep next-generation sequencing data |
title_fullStr | Analysis of error profiles in deep next-generation sequencing data |
title_full_unstemmed | Analysis of error profiles in deep next-generation sequencing data |
title_short | Analysis of error profiles in deep next-generation sequencing data |
title_sort | analysis of error profiles in deep next-generation sequencing data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417284/ https://www.ncbi.nlm.nih.gov/pubmed/30867008 http://dx.doi.org/10.1186/s13059-019-1659-6 |
work_keys_str_mv | AT maxiaotu analysisoferrorprofilesindeepnextgenerationsequencingdata AT shaoying analysisoferrorprofilesindeepnextgenerationsequencingdata AT tianliqing analysisoferrorprofilesindeepnextgenerationsequencingdata AT flaschdianea analysisoferrorprofilesindeepnextgenerationsequencingdata AT mulderheatherl analysisoferrorprofilesindeepnextgenerationsequencingdata AT edmonsonmichaeln analysisoferrorprofilesindeepnextgenerationsequencingdata AT liuyu analysisoferrorprofilesindeepnextgenerationsequencingdata AT chenxiang analysisoferrorprofilesindeepnextgenerationsequencingdata AT newmanscott analysisoferrorprofilesindeepnextgenerationsequencingdata AT nakitandwejoy analysisoferrorprofilesindeepnextgenerationsequencingdata AT liyongjin analysisoferrorprofilesindeepnextgenerationsequencingdata AT libenshang analysisoferrorprofilesindeepnextgenerationsequencingdata AT shenshuhong analysisoferrorprofilesindeepnextgenerationsequencingdata AT wangzhaoming analysisoferrorprofilesindeepnextgenerationsequencingdata AT shurtleffsheila analysisoferrorprofilesindeepnextgenerationsequencingdata AT robisonlesliel analysisoferrorprofilesindeepnextgenerationsequencingdata AT levyshawn analysisoferrorprofilesindeepnextgenerationsequencingdata AT eastonjohn analysisoferrorprofilesindeepnextgenerationsequencingdata AT zhangjinghui analysisoferrorprofilesindeepnextgenerationsequencingdata |