Cargando…

Accurate gene consensus at low nanopore coverage

BACKGROUND: Nanopore technologies allow high-throughput sequencing of long strands of DNA at the cost of a relatively large error rate. This limits its use in the reading of amplicon libraries in which there are only a few mutations per variant and therefore they are easily confused with the sequenc...

Descripción completa

Detalles Bibliográficos
Autores principales: Espada, Rocío, Zarevski, Nikola, Dramé-Maigné, Adèle, Rondelez, Yannick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9646519/
https://www.ncbi.nlm.nih.gov/pubmed/36352541
http://dx.doi.org/10.1093/gigascience/giac102
_version_ 1784827183554887680
author Espada, Rocío
Zarevski, Nikola
Dramé-Maigné, Adèle
Rondelez, Yannick
author_facet Espada, Rocío
Zarevski, Nikola
Dramé-Maigné, Adèle
Rondelez, Yannick
author_sort Espada, Rocío
collection PubMed
description BACKGROUND: Nanopore technologies allow high-throughput sequencing of long strands of DNA at the cost of a relatively large error rate. This limits its use in the reading of amplicon libraries in which there are only a few mutations per variant and therefore they are easily confused with the sequencing noise. Consensus calling strategies reduce the error but sacrifice part of the throughput on reading typically 30 to 100 times each member of the library. FINDINGS: In this work, we introduce SINGLe (SNPs In Nanopore reads of Gene Libraries), an error correction method to reduce the noise in nanopore reads of amplicons containing point variations. SINGLe exploits that in an amplicon library, all reads are very similar to a wild-type sequence from which it is possible to experimentally characterize the position-specific systematic sequencing error pattern. Then, it uses this information to reweight the confidence given to nucleotides that do not match the wild-type in individual variant reads and incorporates it on the consensus calculation. CONCLUSIONS: We tested SINGLe in a mutagenic library of the KlenTaq polymerase gene, where the true mutation rate was below the sequencing noise. We observed that contrary to other methods, SINGLe compensates for the systematic errors made by the basecallers. Consequently, SINGLe converges to the true sequence using as little as 5 reads per variant, fewer than the other available methods.
format Online
Article
Text
id pubmed-9646519
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-96465192022-11-14 Accurate gene consensus at low nanopore coverage Espada, Rocío Zarevski, Nikola Dramé-Maigné, Adèle Rondelez, Yannick Gigascience Technical Note BACKGROUND: Nanopore technologies allow high-throughput sequencing of long strands of DNA at the cost of a relatively large error rate. This limits its use in the reading of amplicon libraries in which there are only a few mutations per variant and therefore they are easily confused with the sequencing noise. Consensus calling strategies reduce the error but sacrifice part of the throughput on reading typically 30 to 100 times each member of the library. FINDINGS: In this work, we introduce SINGLe (SNPs In Nanopore reads of Gene Libraries), an error correction method to reduce the noise in nanopore reads of amplicons containing point variations. SINGLe exploits that in an amplicon library, all reads are very similar to a wild-type sequence from which it is possible to experimentally characterize the position-specific systematic sequencing error pattern. Then, it uses this information to reweight the confidence given to nucleotides that do not match the wild-type in individual variant reads and incorporates it on the consensus calculation. CONCLUSIONS: We tested SINGLe in a mutagenic library of the KlenTaq polymerase gene, where the true mutation rate was below the sequencing noise. We observed that contrary to other methods, SINGLe compensates for the systematic errors made by the basecallers. Consequently, SINGLe converges to the true sequence using as little as 5 reads per variant, fewer than the other available methods. Oxford University Press 2022-11-09 /pmc/articles/PMC9646519/ /pubmed/36352541 http://dx.doi.org/10.1093/gigascience/giac102 Text en © The Author(s) 2022. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Espada, Rocío
Zarevski, Nikola
Dramé-Maigné, Adèle
Rondelez, Yannick
Accurate gene consensus at low nanopore coverage
title Accurate gene consensus at low nanopore coverage
title_full Accurate gene consensus at low nanopore coverage
title_fullStr Accurate gene consensus at low nanopore coverage
title_full_unstemmed Accurate gene consensus at low nanopore coverage
title_short Accurate gene consensus at low nanopore coverage
title_sort accurate gene consensus at low nanopore coverage
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9646519/
https://www.ncbi.nlm.nih.gov/pubmed/36352541
http://dx.doi.org/10.1093/gigascience/giac102
work_keys_str_mv AT espadarocio accurategeneconsensusatlownanoporecoverage
AT zarevskinikola accurategeneconsensusatlownanoporecoverage
AT dramemaigneadele accurategeneconsensusatlownanoporecoverage
AT rondelezyannick accurategeneconsensusatlownanoporecoverage