Cargando…

Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet

DNA cytosine methylation is an important epigenomic mark with a wide range of functions in many organisms. Whole genome bisulfite sequencing is the gold standard to interrogate cytosine methylation genome-wide. Algorithms used to map bisulfite-converted reads often encode the four-base DNA alphabet...

Descripción completa

Detalles Bibliográficos
Autores principales:	de Sena Brandine, Guilherme, Smith, Andrew D
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	High Throughput Sequencing Methods
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8693577/ https://www.ncbi.nlm.nih.gov/pubmed/34988438 http://dx.doi.org/10.1093/nargab/lqab115

_version_	1784619170438053888
author	de Sena Brandine, Guilherme Smith, Andrew D
author_facet	de Sena Brandine, Guilherme Smith, Andrew D
author_sort	de Sena Brandine, Guilherme
collection	PubMed
description	DNA cytosine methylation is an important epigenomic mark with a wide range of functions in many organisms. Whole genome bisulfite sequencing is the gold standard to interrogate cytosine methylation genome-wide. Algorithms used to map bisulfite-converted reads often encode the four-base DNA alphabet with three letters by reducing two bases to a common letter. This encoding substantially reduces the entropy of nucleotide frequencies in the resulting reference genome. Within the paradigm of read mapping by first filtering possible candidate alignments, reduced entropy in the sequence space can increase the required computing effort. We introduce another bisulfite mapping algorithm (abismal), based on the idea of encoding a four-letter DNA sequence as only two letters, one for purines and one for pyrimidines. We show that this encoding can lead to greater specificity compared to existing encodings used to map bisulfite sequencing reads. Through the two-letter encoding, the abismal software tool maps reads in less time and using less memory than most bisulfite sequencing read mapping software tools, while attaining similar accuracy. This allows in silico methylation analysis to be performed in a wider range of computing machines with limited hardware settings.
format	Online Article Text
id	pubmed-8693577
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-86935772022-01-04 Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet de Sena Brandine, Guilherme Smith, Andrew D NAR Genom Bioinform High Throughput Sequencing Methods DNA cytosine methylation is an important epigenomic mark with a wide range of functions in many organisms. Whole genome bisulfite sequencing is the gold standard to interrogate cytosine methylation genome-wide. Algorithms used to map bisulfite-converted reads often encode the four-base DNA alphabet with three letters by reducing two bases to a common letter. This encoding substantially reduces the entropy of nucleotide frequencies in the resulting reference genome. Within the paradigm of read mapping by first filtering possible candidate alignments, reduced entropy in the sequence space can increase the required computing effort. We introduce another bisulfite mapping algorithm (abismal), based on the idea of encoding a four-letter DNA sequence as only two letters, one for purines and one for pyrimidines. We show that this encoding can lead to greater specificity compared to existing encodings used to map bisulfite sequencing reads. Through the two-letter encoding, the abismal software tool maps reads in less time and using less memory than most bisulfite sequencing read mapping software tools, while attaining similar accuracy. This allows in silico methylation analysis to be performed in a wider range of computing machines with limited hardware settings. Oxford University Press 2021-12-22 /pmc/articles/PMC8693577/ /pubmed/34988438 http://dx.doi.org/10.1093/nargab/lqab115 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	High Throughput Sequencing Methods de Sena Brandine, Guilherme Smith, Andrew D Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
title	Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
title_full	Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
title_fullStr	Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
title_full_unstemmed	Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
title_short	Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
title_sort	fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
topic	High Throughput Sequencing Methods
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8693577/ https://www.ncbi.nlm.nih.gov/pubmed/34988438 http://dx.doi.org/10.1093/nargab/lqab115
work_keys_str_mv	AT desenabrandineguilherme fastandmemoryefficientmappingofshortbisulfitesequencingreadsusingatwoletteralphabet AT smithandrewd fastandmemoryefficientmappingofshortbisulfitesequencingreadsusingatwoletteralphabet

Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet

Ejemplares similares