Cargando…

Identifying and correcting repeat-calling errors in nanopore sequencing of telomeres

Nanopore long-read sequencing is an emerging approach for studying genomes, including long repetitive elements like telomeres. Here, we report extensive basecalling induced errors at telomere repeats across nanopore datasets, sequencing platforms, basecallers, and basecalling models. We find that te...

Descripción completa

Detalles Bibliográficos
Autores principales: Tan, Kar-Tong, Slevin, Michael K., Meyerson, Matthew, Li, Heng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9414165/
https://www.ncbi.nlm.nih.gov/pubmed/36028900
http://dx.doi.org/10.1186/s13059-022-02751-6
Descripción
Sumario:Nanopore long-read sequencing is an emerging approach for studying genomes, including long repetitive elements like telomeres. Here, we report extensive basecalling induced errors at telomere repeats across nanopore datasets, sequencing platforms, basecallers, and basecalling models. We find that telomeres in many organisms are frequently miscalled. We demonstrate that tuning of nanopore basecalling models leads to improved recovery and analysis of telomeric regions, with minimal negative impact on other genomic regions. We highlight the importance of verifying nanopore basecalls in long, repetitive, and poorly defined regions, and showcase how artefacts can be resolved by improvements in nanopore basecalling models. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-022-02751-6.