Cargando…

4acCPred: Weakly supervised prediction of N(4)-acetyldeoxycytosine DNA modification from sequences

DNA methylation is one of the earliest epigenetic regulation mechanisms studied extensively, and it is critical for normal development, diseases, and gene expression. As a recently identified chemical modification of DNA, N4-acetyldeoxycytosine (4acC) was shown to be abundant in Arabidopsis and high...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Jingxian, Wang, Xuan, Wei, Zhen, Meng, Jia, Huang, Daiyun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society of Gene & Cell Therapy 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9636570/
https://www.ncbi.nlm.nih.gov/pubmed/36381577
http://dx.doi.org/10.1016/j.omtn.2022.10.004
_version_ 1784824973779533824
author Zhou, Jingxian
Wang, Xuan
Wei, Zhen
Meng, Jia
Huang, Daiyun
author_facet Zhou, Jingxian
Wang, Xuan
Wei, Zhen
Meng, Jia
Huang, Daiyun
author_sort Zhou, Jingxian
collection PubMed
description DNA methylation is one of the earliest epigenetic regulation mechanisms studied extensively, and it is critical for normal development, diseases, and gene expression. As a recently identified chemical modification of DNA, N4-acetyldeoxycytosine (4acC) was shown to be abundant in Arabidopsis and highly associated with gene expression and actively transcribed genes. Precise identification of 4acC is essential for studying its biological function. We proposed the 4acCPred, the first computational framework for predicting 4acC-carrying regions from Arabidopsis genomic DNA sequences. Since the existing 4acC data are not precise for a specific base but only report regions that are hundreds of bases long, we formulated the task as a weakly supervised learning problem and built 4acCPred using a multi-instance-based deep neural network. Both cross-validation and independent testing on the four datasets under different conditions show promising performance, with mean areas under the receiver operating characteristic curve (AUCs) of 0.9877 and 0.9899, respectively. 4acCPred also provides motif mining through model interpretation. The motifs found by 4acCPred are consistent with existing knowledge, indicating that the model successfully captured real biological signals. In addition, a user-friendly web server was built to facilitate 4acC prediction, motif visualization, and data access. Our framework and web server should serve as useful tools for 4acC research.
format Online
Article
Text
id pubmed-9636570
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Society of Gene & Cell Therapy
record_format MEDLINE/PubMed
spelling pubmed-96365702022-11-14 4acCPred: Weakly supervised prediction of N(4)-acetyldeoxycytosine DNA modification from sequences Zhou, Jingxian Wang, Xuan Wei, Zhen Meng, Jia Huang, Daiyun Mol Ther Nucleic Acids Original Article DNA methylation is one of the earliest epigenetic regulation mechanisms studied extensively, and it is critical for normal development, diseases, and gene expression. As a recently identified chemical modification of DNA, N4-acetyldeoxycytosine (4acC) was shown to be abundant in Arabidopsis and highly associated with gene expression and actively transcribed genes. Precise identification of 4acC is essential for studying its biological function. We proposed the 4acCPred, the first computational framework for predicting 4acC-carrying regions from Arabidopsis genomic DNA sequences. Since the existing 4acC data are not precise for a specific base but only report regions that are hundreds of bases long, we formulated the task as a weakly supervised learning problem and built 4acCPred using a multi-instance-based deep neural network. Both cross-validation and independent testing on the four datasets under different conditions show promising performance, with mean areas under the receiver operating characteristic curve (AUCs) of 0.9877 and 0.9899, respectively. 4acCPred also provides motif mining through model interpretation. The motifs found by 4acCPred are consistent with existing knowledge, indicating that the model successfully captured real biological signals. In addition, a user-friendly web server was built to facilitate 4acC prediction, motif visualization, and data access. Our framework and web server should serve as useful tools for 4acC research. American Society of Gene & Cell Therapy 2022-10-14 /pmc/articles/PMC9636570/ /pubmed/36381577 http://dx.doi.org/10.1016/j.omtn.2022.10.004 Text en © 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Original Article
Zhou, Jingxian
Wang, Xuan
Wei, Zhen
Meng, Jia
Huang, Daiyun
4acCPred: Weakly supervised prediction of N(4)-acetyldeoxycytosine DNA modification from sequences
title 4acCPred: Weakly supervised prediction of N(4)-acetyldeoxycytosine DNA modification from sequences
title_full 4acCPred: Weakly supervised prediction of N(4)-acetyldeoxycytosine DNA modification from sequences
title_fullStr 4acCPred: Weakly supervised prediction of N(4)-acetyldeoxycytosine DNA modification from sequences
title_full_unstemmed 4acCPred: Weakly supervised prediction of N(4)-acetyldeoxycytosine DNA modification from sequences
title_short 4acCPred: Weakly supervised prediction of N(4)-acetyldeoxycytosine DNA modification from sequences
title_sort 4accpred: weakly supervised prediction of n(4)-acetyldeoxycytosine dna modification from sequences
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9636570/
https://www.ncbi.nlm.nih.gov/pubmed/36381577
http://dx.doi.org/10.1016/j.omtn.2022.10.004
work_keys_str_mv AT zhoujingxian 4accpredweaklysupervisedpredictionofn4acetyldeoxycytosinednamodificationfromsequences
AT wangxuan 4accpredweaklysupervisedpredictionofn4acetyldeoxycytosinednamodificationfromsequences
AT weizhen 4accpredweaklysupervisedpredictionofn4acetyldeoxycytosinednamodificationfromsequences
AT mengjia 4accpredweaklysupervisedpredictionofn4acetyldeoxycytosinednamodificationfromsequences
AT huangdaiyun 4accpredweaklysupervisedpredictionofn4acetyldeoxycytosinednamodificationfromsequences