Cargando…

Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet

DNA cytosine methylation is an important epigenomic mark with a wide range of functions in many organisms. Whole genome bisulfite sequencing is the gold standard to interrogate cytosine methylation genome-wide. Algorithms used to map bisulfite-converted reads often encode the four-base DNA alphabet...

Descripción completa

Detalles Bibliográficos
Autores principales: de Sena Brandine, Guilherme, Smith, Andrew D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8693577/
https://www.ncbi.nlm.nih.gov/pubmed/34988438
http://dx.doi.org/10.1093/nargab/lqab115
_version_ 1784619170438053888
author de Sena Brandine, Guilherme
Smith, Andrew D
author_facet de Sena Brandine, Guilherme
Smith, Andrew D
author_sort de Sena Brandine, Guilherme
collection PubMed
description DNA cytosine methylation is an important epigenomic mark with a wide range of functions in many organisms. Whole genome bisulfite sequencing is the gold standard to interrogate cytosine methylation genome-wide. Algorithms used to map bisulfite-converted reads often encode the four-base DNA alphabet with three letters by reducing two bases to a common letter. This encoding substantially reduces the entropy of nucleotide frequencies in the resulting reference genome. Within the paradigm of read mapping by first filtering possible candidate alignments, reduced entropy in the sequence space can increase the required computing effort. We introduce another bisulfite mapping algorithm (abismal), based on the idea of encoding a four-letter DNA sequence as only two letters, one for purines and one for pyrimidines. We show that this encoding can lead to greater specificity compared to existing encodings used to map bisulfite sequencing reads. Through the two-letter encoding, the abismal software tool maps reads in less time and using less memory than most bisulfite sequencing read mapping software tools, while attaining similar accuracy. This allows in silico methylation analysis to be performed in a wider range of computing machines with limited hardware settings.
format Online
Article
Text
id pubmed-8693577
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-86935772022-01-04 Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet de Sena Brandine, Guilherme Smith, Andrew D NAR Genom Bioinform High Throughput Sequencing Methods DNA cytosine methylation is an important epigenomic mark with a wide range of functions in many organisms. Whole genome bisulfite sequencing is the gold standard to interrogate cytosine methylation genome-wide. Algorithms used to map bisulfite-converted reads often encode the four-base DNA alphabet with three letters by reducing two bases to a common letter. This encoding substantially reduces the entropy of nucleotide frequencies in the resulting reference genome. Within the paradigm of read mapping by first filtering possible candidate alignments, reduced entropy in the sequence space can increase the required computing effort. We introduce another bisulfite mapping algorithm (abismal), based on the idea of encoding a four-letter DNA sequence as only two letters, one for purines and one for pyrimidines. We show that this encoding can lead to greater specificity compared to existing encodings used to map bisulfite sequencing reads. Through the two-letter encoding, the abismal software tool maps reads in less time and using less memory than most bisulfite sequencing read mapping software tools, while attaining similar accuracy. This allows in silico methylation analysis to be performed in a wider range of computing machines with limited hardware settings. Oxford University Press 2021-12-22 /pmc/articles/PMC8693577/ /pubmed/34988438 http://dx.doi.org/10.1093/nargab/lqab115 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle High Throughput Sequencing Methods
de Sena Brandine, Guilherme
Smith, Andrew D
Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
title Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
title_full Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
title_fullStr Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
title_full_unstemmed Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
title_short Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
title_sort fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
topic High Throughput Sequencing Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8693577/
https://www.ncbi.nlm.nih.gov/pubmed/34988438
http://dx.doi.org/10.1093/nargab/lqab115
work_keys_str_mv AT desenabrandineguilherme fastandmemoryefficientmappingofshortbisulfitesequencingreadsusingatwoletteralphabet
AT smithandrewd fastandmemoryefficientmappingofshortbisulfitesequencingreadsusingatwoletteralphabet