Cargando…
Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet
DNA cytosine methylation is an important epigenomic mark with a wide range of functions in many organisms. Whole genome bisulfite sequencing is the gold standard to interrogate cytosine methylation genome-wide. Algorithms used to map bisulfite-converted reads often encode the four-base DNA alphabet...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8693577/ https://www.ncbi.nlm.nih.gov/pubmed/34988438 http://dx.doi.org/10.1093/nargab/lqab115 |
_version_ | 1784619170438053888 |
---|---|
author | de Sena Brandine, Guilherme Smith, Andrew D |
author_facet | de Sena Brandine, Guilherme Smith, Andrew D |
author_sort | de Sena Brandine, Guilherme |
collection | PubMed |
description | DNA cytosine methylation is an important epigenomic mark with a wide range of functions in many organisms. Whole genome bisulfite sequencing is the gold standard to interrogate cytosine methylation genome-wide. Algorithms used to map bisulfite-converted reads often encode the four-base DNA alphabet with three letters by reducing two bases to a common letter. This encoding substantially reduces the entropy of nucleotide frequencies in the resulting reference genome. Within the paradigm of read mapping by first filtering possible candidate alignments, reduced entropy in the sequence space can increase the required computing effort. We introduce another bisulfite mapping algorithm (abismal), based on the idea of encoding a four-letter DNA sequence as only two letters, one for purines and one for pyrimidines. We show that this encoding can lead to greater specificity compared to existing encodings used to map bisulfite sequencing reads. Through the two-letter encoding, the abismal software tool maps reads in less time and using less memory than most bisulfite sequencing read mapping software tools, while attaining similar accuracy. This allows in silico methylation analysis to be performed in a wider range of computing machines with limited hardware settings. |
format | Online Article Text |
id | pubmed-8693577 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-86935772022-01-04 Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet de Sena Brandine, Guilherme Smith, Andrew D NAR Genom Bioinform High Throughput Sequencing Methods DNA cytosine methylation is an important epigenomic mark with a wide range of functions in many organisms. Whole genome bisulfite sequencing is the gold standard to interrogate cytosine methylation genome-wide. Algorithms used to map bisulfite-converted reads often encode the four-base DNA alphabet with three letters by reducing two bases to a common letter. This encoding substantially reduces the entropy of nucleotide frequencies in the resulting reference genome. Within the paradigm of read mapping by first filtering possible candidate alignments, reduced entropy in the sequence space can increase the required computing effort. We introduce another bisulfite mapping algorithm (abismal), based on the idea of encoding a four-letter DNA sequence as only two letters, one for purines and one for pyrimidines. We show that this encoding can lead to greater specificity compared to existing encodings used to map bisulfite sequencing reads. Through the two-letter encoding, the abismal software tool maps reads in less time and using less memory than most bisulfite sequencing read mapping software tools, while attaining similar accuracy. This allows in silico methylation analysis to be performed in a wider range of computing machines with limited hardware settings. Oxford University Press 2021-12-22 /pmc/articles/PMC8693577/ /pubmed/34988438 http://dx.doi.org/10.1093/nargab/lqab115 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | High Throughput Sequencing Methods de Sena Brandine, Guilherme Smith, Andrew D Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet |
title | Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet |
title_full | Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet |
title_fullStr | Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet |
title_full_unstemmed | Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet |
title_short | Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet |
title_sort | fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet |
topic | High Throughput Sequencing Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8693577/ https://www.ncbi.nlm.nih.gov/pubmed/34988438 http://dx.doi.org/10.1093/nargab/lqab115 |
work_keys_str_mv | AT desenabrandineguilherme fastandmemoryefficientmappingofshortbisulfitesequencingreadsusingatwoletteralphabet AT smithandrewd fastandmemoryefficientmappingofshortbisulfitesequencingreadsusingatwoletteralphabet |