Cargando…

Sequencing error profiles of Illumina sequencing instruments

Sequencing technology has achieved great advances in the past decade. Studies have previously shown the quality of specific instruments in controlled conditions. Here, we developed a method able to retroactively determine the error rate of most public sequencing datasets. To do this, we utilized the...

Descripción completa

Detalles Bibliográficos
Autores principales: Stoler, Nicholas, Nekrutenko, Anton
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8002175/
https://www.ncbi.nlm.nih.gov/pubmed/33817639
http://dx.doi.org/10.1093/nargab/lqab019
_version_ 1783671402187980800
author Stoler, Nicholas
Nekrutenko, Anton
author_facet Stoler, Nicholas
Nekrutenko, Anton
author_sort Stoler, Nicholas
collection PubMed
description Sequencing technology has achieved great advances in the past decade. Studies have previously shown the quality of specific instruments in controlled conditions. Here, we developed a method able to retroactively determine the error rate of most public sequencing datasets. To do this, we utilized the overlaps between reads that are a feature of many sequencing libraries. With this method, we surveyed 1943 different datasets from seven different sequencing instruments produced by Illumina. We show that among public datasets, the more expensive platforms like HiSeq and NovaSeq have a lower error rate and less variation. But we also discovered that there is great variation within each platform, with the accuracy of a sequencing experiment depending greatly on the experimenter. We show the importance of sequence context, especially the phenomenon where preceding bases bias the following bases toward the same identity. We also show the difference in patterns of sequence bias between instruments. Contrary to expectations based on the underlying chemistry, HiSeq X Ten and NovaSeq 6000 share notable exceptions to the preceding-base bias. Our results demonstrate the importance of the specific circumstances of every sequencing experiment, and the importance of evaluating the quality of each one.
format Online
Article
Text
id pubmed-8002175
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-80021752021-04-01 Sequencing error profiles of Illumina sequencing instruments Stoler, Nicholas Nekrutenko, Anton NAR Genom Bioinform Methart Sequencing technology has achieved great advances in the past decade. Studies have previously shown the quality of specific instruments in controlled conditions. Here, we developed a method able to retroactively determine the error rate of most public sequencing datasets. To do this, we utilized the overlaps between reads that are a feature of many sequencing libraries. With this method, we surveyed 1943 different datasets from seven different sequencing instruments produced by Illumina. We show that among public datasets, the more expensive platforms like HiSeq and NovaSeq have a lower error rate and less variation. But we also discovered that there is great variation within each platform, with the accuracy of a sequencing experiment depending greatly on the experimenter. We show the importance of sequence context, especially the phenomenon where preceding bases bias the following bases toward the same identity. We also show the difference in patterns of sequence bias between instruments. Contrary to expectations based on the underlying chemistry, HiSeq X Ten and NovaSeq 6000 share notable exceptions to the preceding-base bias. Our results demonstrate the importance of the specific circumstances of every sequencing experiment, and the importance of evaluating the quality of each one. Oxford University Press 2021-03-27 /pmc/articles/PMC8002175/ /pubmed/33817639 http://dx.doi.org/10.1093/nargab/lqab019 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methart
Stoler, Nicholas
Nekrutenko, Anton
Sequencing error profiles of Illumina sequencing instruments
title Sequencing error profiles of Illumina sequencing instruments
title_full Sequencing error profiles of Illumina sequencing instruments
title_fullStr Sequencing error profiles of Illumina sequencing instruments
title_full_unstemmed Sequencing error profiles of Illumina sequencing instruments
title_short Sequencing error profiles of Illumina sequencing instruments
title_sort sequencing error profiles of illumina sequencing instruments
topic Methart
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8002175/
https://www.ncbi.nlm.nih.gov/pubmed/33817639
http://dx.doi.org/10.1093/nargab/lqab019
work_keys_str_mv AT stolernicholas sequencingerrorprofilesofilluminasequencinginstruments
AT nekrutenkoanton sequencingerrorprofilesofilluminasequencinginstruments