Cargando…

On the prediction of non-CG DNA methylation using machine learning

DNA methylation can be detected and measured using sequencing instruments after sodium bisulfite conversion, but experiments can be expensive for large eukaryotic genomes. Sequencing nonuniformity and mapping biases can leave parts of the genome with low or no coverage, thus hampering the ability of...

Descripción completa

Detalles Bibliográficos
Autores principales: Sereshki, Saleh, Lee, Nathan, Omirou, Michalis, Fasoula, Dionysia, Lonardi, Stefano
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10189801/
https://www.ncbi.nlm.nih.gov/pubmed/37206627
http://dx.doi.org/10.1093/nargab/lqad045
_version_ 1785043161434816512
author Sereshki, Saleh
Lee, Nathan
Omirou, Michalis
Fasoula, Dionysia
Lonardi, Stefano
author_facet Sereshki, Saleh
Lee, Nathan
Omirou, Michalis
Fasoula, Dionysia
Lonardi, Stefano
author_sort Sereshki, Saleh
collection PubMed
description DNA methylation can be detected and measured using sequencing instruments after sodium bisulfite conversion, but experiments can be expensive for large eukaryotic genomes. Sequencing nonuniformity and mapping biases can leave parts of the genome with low or no coverage, thus hampering the ability of obtaining DNA methylation levels for all cytosines. To address these limitations, several computational methods have been proposed that can predict DNA methylation from the DNA sequence around the cytosine or from the methylation level of nearby cytosines. However, most of these methods are entirely focused on CG methylation in humans and other mammals. In this work, we study, for the first time, the problem of predicting cytosine methylation for CG, CHG and CHH contexts on six plant species, either from the DNA primary sequence around the cytosine or from the methylation levels of neighboring cytosines. In this framework, we also study the cross-species prediction problem and the cross-context prediction problem (within the same species). Finally, we show that providing gene and repeat annotations allows existing classifiers to significantly improve their prediction accuracy. We introduce a new classifier called AMPS (annotation-based methylation prediction from sequence) that takes advantage of genomic annotations to achieve higher accuracy.
format Online
Article
Text
id pubmed-10189801
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101898012023-05-18 On the prediction of non-CG DNA methylation using machine learning Sereshki, Saleh Lee, Nathan Omirou, Michalis Fasoula, Dionysia Lonardi, Stefano NAR Genom Bioinform Standard Article DNA methylation can be detected and measured using sequencing instruments after sodium bisulfite conversion, but experiments can be expensive for large eukaryotic genomes. Sequencing nonuniformity and mapping biases can leave parts of the genome with low or no coverage, thus hampering the ability of obtaining DNA methylation levels for all cytosines. To address these limitations, several computational methods have been proposed that can predict DNA methylation from the DNA sequence around the cytosine or from the methylation level of nearby cytosines. However, most of these methods are entirely focused on CG methylation in humans and other mammals. In this work, we study, for the first time, the problem of predicting cytosine methylation for CG, CHG and CHH contexts on six plant species, either from the DNA primary sequence around the cytosine or from the methylation levels of neighboring cytosines. In this framework, we also study the cross-species prediction problem and the cross-context prediction problem (within the same species). Finally, we show that providing gene and repeat annotations allows existing classifiers to significantly improve their prediction accuracy. We introduce a new classifier called AMPS (annotation-based methylation prediction from sequence) that takes advantage of genomic annotations to achieve higher accuracy. Oxford University Press 2023-05-17 /pmc/articles/PMC10189801/ /pubmed/37206627 http://dx.doi.org/10.1093/nargab/lqad045 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Standard Article
Sereshki, Saleh
Lee, Nathan
Omirou, Michalis
Fasoula, Dionysia
Lonardi, Stefano
On the prediction of non-CG DNA methylation using machine learning
title On the prediction of non-CG DNA methylation using machine learning
title_full On the prediction of non-CG DNA methylation using machine learning
title_fullStr On the prediction of non-CG DNA methylation using machine learning
title_full_unstemmed On the prediction of non-CG DNA methylation using machine learning
title_short On the prediction of non-CG DNA methylation using machine learning
title_sort on the prediction of non-cg dna methylation using machine learning
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10189801/
https://www.ncbi.nlm.nih.gov/pubmed/37206627
http://dx.doi.org/10.1093/nargab/lqad045
work_keys_str_mv AT sereshkisaleh onthepredictionofnoncgdnamethylationusingmachinelearning
AT leenathan onthepredictionofnoncgdnamethylationusingmachinelearning
AT omiroumichalis onthepredictionofnoncgdnamethylationusingmachinelearning
AT fasouladionysia onthepredictionofnoncgdnamethylationusingmachinelearning
AT lonardistefano onthepredictionofnoncgdnamethylationusingmachinelearning