Cargando…

SAKE: Strobemer-assisted k-mer extraction

K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full advantage of such methods, the k-mer content of a read set must be captured as accurately as possible. Often the use of long k-mers is...

Descripción completa

Detalles Bibliográficos
Autores principales: Leinonen, Miika, Salmela, Leena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10686461/
https://www.ncbi.nlm.nih.gov/pubmed/38019768
http://dx.doi.org/10.1371/journal.pone.0294415
_version_ 1785151782471598080
author Leinonen, Miika
Salmela, Leena
author_facet Leinonen, Miika
Salmela, Leena
author_sort Leinonen, Miika
collection PubMed
description K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full advantage of such methods, the k-mer content of a read set must be captured as accurately as possible. Often the use of long k-mers is preferred because they can be uniquely associated with a specific genomic region. Unfortunately, it is not possible to reliably extract long k-mers in high error rate reads with standard exact k-mer counting methods. We propose SAKE, a method to extract long k-mers from high error rate reads by utilizing strobemers and consensus k-mer generation through partial order alignment. Our experiments show that on simulated data with up to 6% error rate, SAKE can extract 97-mers with over 90% recall. Conversely, the recall of DSK, an exact k-mer counter, drops to less than 20%. Furthermore, the precision of SAKE remains similar to DSK. On real bacterial data, SAKE retrieves 97-mers with a recall of over 90% and slightly lower precision than DSK, while the recall of DSK already drops to 50%. We show that SAKE can extract more k-mers from uncorrected high error rate reads compared to exact k-mer counting. However, exact k-mer counters run on corrected reads can extract slightly more k-mers than SAKE run on uncorrected reads.
format Online
Article
Text
id pubmed-10686461
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-106864612023-11-30 SAKE: Strobemer-assisted k-mer extraction Leinonen, Miika Salmela, Leena PLoS One Research Article K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full advantage of such methods, the k-mer content of a read set must be captured as accurately as possible. Often the use of long k-mers is preferred because they can be uniquely associated with a specific genomic region. Unfortunately, it is not possible to reliably extract long k-mers in high error rate reads with standard exact k-mer counting methods. We propose SAKE, a method to extract long k-mers from high error rate reads by utilizing strobemers and consensus k-mer generation through partial order alignment. Our experiments show that on simulated data with up to 6% error rate, SAKE can extract 97-mers with over 90% recall. Conversely, the recall of DSK, an exact k-mer counter, drops to less than 20%. Furthermore, the precision of SAKE remains similar to DSK. On real bacterial data, SAKE retrieves 97-mers with a recall of over 90% and slightly lower precision than DSK, while the recall of DSK already drops to 50%. We show that SAKE can extract more k-mers from uncorrected high error rate reads compared to exact k-mer counting. However, exact k-mer counters run on corrected reads can extract slightly more k-mers than SAKE run on uncorrected reads. Public Library of Science 2023-11-29 /pmc/articles/PMC10686461/ /pubmed/38019768 http://dx.doi.org/10.1371/journal.pone.0294415 Text en © 2023 Leinonen, Salmela https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Leinonen, Miika
Salmela, Leena
SAKE: Strobemer-assisted k-mer extraction
title SAKE: Strobemer-assisted k-mer extraction
title_full SAKE: Strobemer-assisted k-mer extraction
title_fullStr SAKE: Strobemer-assisted k-mer extraction
title_full_unstemmed SAKE: Strobemer-assisted k-mer extraction
title_short SAKE: Strobemer-assisted k-mer extraction
title_sort sake: strobemer-assisted k-mer extraction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10686461/
https://www.ncbi.nlm.nih.gov/pubmed/38019768
http://dx.doi.org/10.1371/journal.pone.0294415
work_keys_str_mv AT leinonenmiika sakestrobemerassistedkmerextraction
AT salmelaleena sakestrobemerassistedkmerextraction