Cargando…

Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly

A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low...

Descripción completa

Detalles Bibliográficos
Autores principales: Holley, Guillaume, Beyter, Doruk, Ingimundardottir, Helga, Møller, Peter L., Kristmundsdottir, Snædis, Eggertsson, Hannes P., Halldorsson, Bjarni V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7792008/
https://www.ncbi.nlm.nih.gov/pubmed/33419473
http://dx.doi.org/10.1186/s13059-020-02244-4
Descripción
Sumario:A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s13059-020-02244-4).