Cargando…

Mapping-friendly sequence reductions: Going beyond homopolymer compression

Sequencing errors continue to pose algorithmic challenges to methods working with sequencing data. One of the simplest and most prevalent techniques for ameliorating the detrimental effects of homopolymer expansion/contraction errors present in long reads is homopolymer compression. It collapses run...

Descripción completa

Detalles Bibliográficos
Autores principales: Blassel, Luc, Medvedev, Paul, Chikhi, Rayan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9633736/
https://www.ncbi.nlm.nih.gov/pubmed/36339268
http://dx.doi.org/10.1016/j.isci.2022.105305
_version_ 1784824303584280576
author Blassel, Luc
Medvedev, Paul
Chikhi, Rayan
author_facet Blassel, Luc
Medvedev, Paul
Chikhi, Rayan
author_sort Blassel, Luc
collection PubMed
description Sequencing errors continue to pose algorithmic challenges to methods working with sequencing data. One of the simplest and most prevalent techniques for ameliorating the detrimental effects of homopolymer expansion/contraction errors present in long reads is homopolymer compression. It collapses runs of repeated nucleotides, to remove some sequencing errors and improve mapping sensitivity. Though our intuitive understanding justifies why homopolymer compression works, it in no way implies that it is the best transformation that can be done. In this paper, we explore if there are transformations that can be applied in the same pre-processing manner as homopolymer compression that would achieve better alignment sensitivity. We introduce a more general framework than homopolymer compression, called mapping-friendly sequence reductions. We transform the reference and the reads using these reductions and then apply an alignment algorithm. We demonstrate that some mapping-friendly sequence reductions lead to improved mapping accuracy, outperforming homopolymer compression.
format Online
Article
Text
id pubmed-9633736
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-96337362022-11-05 Mapping-friendly sequence reductions: Going beyond homopolymer compression Blassel, Luc Medvedev, Paul Chikhi, Rayan iScience Article Sequencing errors continue to pose algorithmic challenges to methods working with sequencing data. One of the simplest and most prevalent techniques for ameliorating the detrimental effects of homopolymer expansion/contraction errors present in long reads is homopolymer compression. It collapses runs of repeated nucleotides, to remove some sequencing errors and improve mapping sensitivity. Though our intuitive understanding justifies why homopolymer compression works, it in no way implies that it is the best transformation that can be done. In this paper, we explore if there are transformations that can be applied in the same pre-processing manner as homopolymer compression that would achieve better alignment sensitivity. We introduce a more general framework than homopolymer compression, called mapping-friendly sequence reductions. We transform the reference and the reads using these reductions and then apply an alignment algorithm. We demonstrate that some mapping-friendly sequence reductions lead to improved mapping accuracy, outperforming homopolymer compression. Elsevier 2022-10-13 /pmc/articles/PMC9633736/ /pubmed/36339268 http://dx.doi.org/10.1016/j.isci.2022.105305 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Blassel, Luc
Medvedev, Paul
Chikhi, Rayan
Mapping-friendly sequence reductions: Going beyond homopolymer compression
title Mapping-friendly sequence reductions: Going beyond homopolymer compression
title_full Mapping-friendly sequence reductions: Going beyond homopolymer compression
title_fullStr Mapping-friendly sequence reductions: Going beyond homopolymer compression
title_full_unstemmed Mapping-friendly sequence reductions: Going beyond homopolymer compression
title_short Mapping-friendly sequence reductions: Going beyond homopolymer compression
title_sort mapping-friendly sequence reductions: going beyond homopolymer compression
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9633736/
https://www.ncbi.nlm.nih.gov/pubmed/36339268
http://dx.doi.org/10.1016/j.isci.2022.105305
work_keys_str_mv AT blasselluc mappingfriendlysequencereductionsgoingbeyondhomopolymercompression
AT medvedevpaul mappingfriendlysequencereductionsgoingbeyondhomopolymercompression
AT chikhirayan mappingfriendlysequencereductionsgoingbeyondhomopolymercompression