Cargando…
SAKE: Strobemer-assisted k-mer extraction
K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full advantage of such methods, the k-mer content of a read set must be captured as accurately as possible. Often the use of long k-mers is...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10686461/ https://www.ncbi.nlm.nih.gov/pubmed/38019768 http://dx.doi.org/10.1371/journal.pone.0294415 |
_version_ | 1785151782471598080 |
---|---|
author | Leinonen, Miika Salmela, Leena |
author_facet | Leinonen, Miika Salmela, Leena |
author_sort | Leinonen, Miika |
collection | PubMed |
description | K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full advantage of such methods, the k-mer content of a read set must be captured as accurately as possible. Often the use of long k-mers is preferred because they can be uniquely associated with a specific genomic region. Unfortunately, it is not possible to reliably extract long k-mers in high error rate reads with standard exact k-mer counting methods. We propose SAKE, a method to extract long k-mers from high error rate reads by utilizing strobemers and consensus k-mer generation through partial order alignment. Our experiments show that on simulated data with up to 6% error rate, SAKE can extract 97-mers with over 90% recall. Conversely, the recall of DSK, an exact k-mer counter, drops to less than 20%. Furthermore, the precision of SAKE remains similar to DSK. On real bacterial data, SAKE retrieves 97-mers with a recall of over 90% and slightly lower precision than DSK, while the recall of DSK already drops to 50%. We show that SAKE can extract more k-mers from uncorrected high error rate reads compared to exact k-mer counting. However, exact k-mer counters run on corrected reads can extract slightly more k-mers than SAKE run on uncorrected reads. |
format | Online Article Text |
id | pubmed-10686461 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-106864612023-11-30 SAKE: Strobemer-assisted k-mer extraction Leinonen, Miika Salmela, Leena PLoS One Research Article K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full advantage of such methods, the k-mer content of a read set must be captured as accurately as possible. Often the use of long k-mers is preferred because they can be uniquely associated with a specific genomic region. Unfortunately, it is not possible to reliably extract long k-mers in high error rate reads with standard exact k-mer counting methods. We propose SAKE, a method to extract long k-mers from high error rate reads by utilizing strobemers and consensus k-mer generation through partial order alignment. Our experiments show that on simulated data with up to 6% error rate, SAKE can extract 97-mers with over 90% recall. Conversely, the recall of DSK, an exact k-mer counter, drops to less than 20%. Furthermore, the precision of SAKE remains similar to DSK. On real bacterial data, SAKE retrieves 97-mers with a recall of over 90% and slightly lower precision than DSK, while the recall of DSK already drops to 50%. We show that SAKE can extract more k-mers from uncorrected high error rate reads compared to exact k-mer counting. However, exact k-mer counters run on corrected reads can extract slightly more k-mers than SAKE run on uncorrected reads. Public Library of Science 2023-11-29 /pmc/articles/PMC10686461/ /pubmed/38019768 http://dx.doi.org/10.1371/journal.pone.0294415 Text en © 2023 Leinonen, Salmela https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Leinonen, Miika Salmela, Leena SAKE: Strobemer-assisted k-mer extraction |
title | SAKE: Strobemer-assisted k-mer extraction |
title_full | SAKE: Strobemer-assisted k-mer extraction |
title_fullStr | SAKE: Strobemer-assisted k-mer extraction |
title_full_unstemmed | SAKE: Strobemer-assisted k-mer extraction |
title_short | SAKE: Strobemer-assisted k-mer extraction |
title_sort | sake: strobemer-assisted k-mer extraction |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10686461/ https://www.ncbi.nlm.nih.gov/pubmed/38019768 http://dx.doi.org/10.1371/journal.pone.0294415 |
work_keys_str_mv | AT leinonenmiika sakestrobemerassistedkmerextraction AT salmelaleena sakestrobemerassistedkmerextraction |