Cargando…

SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data

BACKGROUND: There is currently no method to precisely measure the errors that occur in the sequencing instrument/sequencer, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. RESULTS: We propose a novel computa...

Descripción completa

Detalles Bibliográficos
Autores principales: Davis, Eric M., Sun, Yu, Liu, Yanling, Kolekar, Pandurang, Shao, Ying, Szlachta, Karol, Mulder, Heather L., Ren, Dongren, Rice, Stephen V., Wang, Zhaoming, Nakitandwe, Joy, Gout, Alexander M., Shaner, Bridget, Hall, Salina, Robison, Leslie L., Pounds, Stanley, Klco, Jeffery M., Easton, John, Ma, Xiaotu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7829059/
https://www.ncbi.nlm.nih.gov/pubmed/33487172
http://dx.doi.org/10.1186/s13059-020-02254-2
_version_ 1783641109983920128
author Davis, Eric M.
Sun, Yu
Liu, Yanling
Kolekar, Pandurang
Shao, Ying
Szlachta, Karol
Mulder, Heather L.
Ren, Dongren
Rice, Stephen V.
Wang, Zhaoming
Nakitandwe, Joy
Gout, Alexander M.
Shaner, Bridget
Hall, Salina
Robison, Leslie L.
Pounds, Stanley
Klco, Jeffery M.
Easton, John
Ma, Xiaotu
author_facet Davis, Eric M.
Sun, Yu
Liu, Yanling
Kolekar, Pandurang
Shao, Ying
Szlachta, Karol
Mulder, Heather L.
Ren, Dongren
Rice, Stephen V.
Wang, Zhaoming
Nakitandwe, Joy
Gout, Alexander M.
Shaner, Bridget
Hall, Salina
Robison, Leslie L.
Pounds, Stanley
Klco, Jeffery M.
Easton, John
Ma, Xiaotu
author_sort Davis, Eric M.
collection PubMed
description BACKGROUND: There is currently no method to precisely measure the errors that occur in the sequencing instrument/sequencer, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. RESULTS: We propose a novel computational method, SequencErr, to address this challenge by measuring the base correspondence between overlapping regions in forward and reverse reads. An analysis of 3777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be ~ 10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates > 100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and > 90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tile. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and removal of outlier error-prone tiles improves sequencing accuracy. We demonstrate that SequencErr can reveal novel insights relative to the popular quality control method FastQC and achieve a 10-fold lower error rate than popular error correction methods including Lighter and Musket. CONCLUSIONS: Our study reveals novel insights into the nature of DNA sequencing errors incurred on DNA sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets.
format Online
Article
Text
id pubmed-7829059
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-78290592021-01-25 SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data Davis, Eric M. Sun, Yu Liu, Yanling Kolekar, Pandurang Shao, Ying Szlachta, Karol Mulder, Heather L. Ren, Dongren Rice, Stephen V. Wang, Zhaoming Nakitandwe, Joy Gout, Alexander M. Shaner, Bridget Hall, Salina Robison, Leslie L. Pounds, Stanley Klco, Jeffery M. Easton, John Ma, Xiaotu Genome Biol Research BACKGROUND: There is currently no method to precisely measure the errors that occur in the sequencing instrument/sequencer, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. RESULTS: We propose a novel computational method, SequencErr, to address this challenge by measuring the base correspondence between overlapping regions in forward and reverse reads. An analysis of 3777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be ~ 10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates > 100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and > 90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tile. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and removal of outlier error-prone tiles improves sequencing accuracy. We demonstrate that SequencErr can reveal novel insights relative to the popular quality control method FastQC and achieve a 10-fold lower error rate than popular error correction methods including Lighter and Musket. CONCLUSIONS: Our study reveals novel insights into the nature of DNA sequencing errors incurred on DNA sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets. BioMed Central 2021-01-25 /pmc/articles/PMC7829059/ /pubmed/33487172 http://dx.doi.org/10.1186/s13059-020-02254-2 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Davis, Eric M.
Sun, Yu
Liu, Yanling
Kolekar, Pandurang
Shao, Ying
Szlachta, Karol
Mulder, Heather L.
Ren, Dongren
Rice, Stephen V.
Wang, Zhaoming
Nakitandwe, Joy
Gout, Alexander M.
Shaner, Bridget
Hall, Salina
Robison, Leslie L.
Pounds, Stanley
Klco, Jeffery M.
Easton, John
Ma, Xiaotu
SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data
title SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data
title_full SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data
title_fullStr SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data
title_full_unstemmed SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data
title_short SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data
title_sort sequencerr: measuring and suppressing sequencer errors in next-generation sequencing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7829059/
https://www.ncbi.nlm.nih.gov/pubmed/33487172
http://dx.doi.org/10.1186/s13059-020-02254-2
work_keys_str_mv AT davisericm sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT sunyu sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT liuyanling sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT kolekarpandurang sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT shaoying sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT szlachtakarol sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT mulderheatherl sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT rendongren sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT ricestephenv sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT wangzhaoming sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT nakitandwejoy sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT goutalexanderm sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT shanerbridget sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT hallsalina sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT robisonlesliel sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT poundsstanley sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT klcojefferym sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT eastonjohn sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata
AT maxiaotu sequencerrmeasuringandsuppressingsequencererrorsinnextgenerationsequencingdata