Cargando…
SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data
BACKGROUND: There is currently no method to precisely measure the errors that occur in the sequencing instrument/sequencer, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. RESULTS: We propose a novel computa...
Autores principales: | , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7829059/ https://www.ncbi.nlm.nih.gov/pubmed/33487172 http://dx.doi.org/10.1186/s13059-020-02254-2 |
_version_ | 1783641109983920128 |
---|---|
author | Davis, Eric M. Sun, Yu Liu, Yanling Kolekar, Pandurang Shao, Ying Szlachta, Karol Mulder, Heather L. Ren, Dongren Rice, Stephen V. Wang, Zhaoming Nakitandwe, Joy Gout, Alexander M. Shaner, Bridget Hall, Salina Robison, Leslie L. Pounds, Stanley Klco, Jeffery M. Easton, John Ma, Xiaotu |
author_facet | Davis, Eric M. Sun, Yu Liu, Yanling Kolekar, Pandurang Shao, Ying Szlachta, Karol Mulder, Heather L. Ren, Dongren Rice, Stephen V. Wang, Zhaoming Nakitandwe, Joy Gout, Alexander M. Shaner, Bridget Hall, Salina Robison, Leslie L. Pounds, Stanley Klco, Jeffery M. Easton, John Ma, Xiaotu |
author_sort | Davis, Eric M. |
collection | PubMed |
description | BACKGROUND: There is currently no method to precisely measure the errors that occur in the sequencing instrument/sequencer, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. RESULTS: We propose a novel computational method, SequencErr, to address this challenge by measuring the base correspondence between overlapping regions in forward and reverse reads. An analysis of 3777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be ~ 10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates > 100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and > 90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tile. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and removal of outlier error-prone tiles improves sequencing accuracy. We demonstrate that SequencErr can reveal novel insights relative to the popular quality control method FastQC and achieve a 10-fold lower error rate than popular error correction methods including Lighter and Musket. CONCLUSIONS: Our study reveals novel insights into the nature of DNA sequencing errors incurred on DNA sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets. |
format | Online Article Text |
id | pubmed-7829059 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-78290592021-01-25 SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data Davis, Eric M. Sun, Yu Liu, Yanling Kolekar, Pandurang Shao, Ying Szlachta, Karol Mulder, Heather L. Ren, Dongren Rice, Stephen V. Wang, Zhaoming Nakitandwe, Joy Gout, Alexander M. Shaner, Bridget Hall, Salina Robison, Leslie L. Pounds, Stanley Klco, Jeffery M. Easton, John Ma, Xiaotu Genome Biol Research BACKGROUND: There is currently no method to precisely measure the errors that occur in the sequencing instrument/sequencer, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. RESULTS: We propose a novel computational method, SequencErr, to address this challenge by measuring the base correspondence between overlapping regions in forward and reverse reads. An analysis of 3777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be ~ 10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates > 100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and > 90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tile. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and removal of outlier error-prone tiles improves sequencing accuracy. We demonstrate that SequencErr can reveal novel insights relative to the popular quality control method FastQC and achieve a 10-fold lower error rate than popular error correction methods including Lighter and Musket. CONCLUSIONS: Our study reveals novel insights into the nature of DNA sequencing errors incurred on DNA sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets. BioMed Central 2021-01-25 /pmc/articles/PMC7829059/ /pubmed/33487172 http://dx.doi.org/10.1186/s13059-020-02254-2 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Davis, Eric M. Sun, Yu Liu, Yanling Kolekar, Pandurang Shao, Ying Szlachta, Karol Mulder, Heather L. Ren, Dongren Rice, Stephen V. Wang, Zhaoming Nakitandwe, Joy Gout, Alexander M. Shaner, Bridget Hall, Salina Robison, Leslie L. Pounds, Stanley Klco, Jeffery M. Easton, John Ma, Xiaotu SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data |
title | SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data |
title_full | SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data |
title_fullStr | SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data |
title_full_unstemmed | SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data |
title_short | SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data |
title_sort | sequencerr: measuring and suppressing sequencer errors in next-generation sequencing data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7829059/ https://www.ncbi.nlm.nih.gov/pubmed/33487172 http://dx.doi.org/10.1186/s13059-020-02254-2 |
work_keys_str_mv | AT davisericm sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT sunyu sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT liuyanling sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT kolekarpandurang sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT shaoying sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT szlachtakarol sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT mulderheatherl sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT rendongren sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT ricestephenv sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT wangzhaoming sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT nakitandwejoy sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT goutalexanderm sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT shanerbridget sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT hallsalina sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT robisonlesliel sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT poundsstanley sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT klcojefferym sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT eastonjohn sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata AT maxiaotu sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata |