Cargando…

BiLSTM-5mC: A Bidirectional Long Short-Term Memory-Based Approach for Predicting 5-Methylcytosine Sites in Genome-Wide DNA Promoters

An important reason of cancer proliferation is the change in DNA methylation patterns, characterized by the localized hypermethylation of the promoters of tumor-suppressor genes together with an overall decrease in the level of 5-methylcytosine (5mC). Therefore, identifying the 5mC sites in the prom...

Descripción completa

Detalles Bibliográficos
Autores principales: Cheng, Xin, Wang, Jun, Li, Qianyue, Liu, Taigang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8704614/
https://www.ncbi.nlm.nih.gov/pubmed/34946497
http://dx.doi.org/10.3390/molecules26247414
_version_ 1784621749007024128
author Cheng, Xin
Wang, Jun
Li, Qianyue
Liu, Taigang
author_facet Cheng, Xin
Wang, Jun
Li, Qianyue
Liu, Taigang
author_sort Cheng, Xin
collection PubMed
description An important reason of cancer proliferation is the change in DNA methylation patterns, characterized by the localized hypermethylation of the promoters of tumor-suppressor genes together with an overall decrease in the level of 5-methylcytosine (5mC). Therefore, identifying the 5mC sites in the promoters is a critical step towards further understanding the diverse functions of DNA methylation in genetic diseases such as cancers and aging. However, most wet-lab experimental techniques are often time consuming and laborious for detecting 5mC sites. In this study, we proposed a deep learning-based approach, called BiLSTM-5mC, for accurately identifying 5mC sites in genome-wide DNA promoters. First, we randomly divided the negative samples into 11 subsets of equal size, one of which can form the balance subset by combining with the positive samples in the same amount. Then, two types of feature vectors encoded by the one-hot method, and the nucleotide property and frequency (NPF) methods were fed into a bidirectional long short-term memory (BiLSTM) network and a full connection layer to train the 22 submodels. Finally, the outputs of these models were integrated to predict 5mC sites by using the majority vote strategy. Our experimental results demonstrated that BiLSTM-5mC outperformed existing methods based on the same independent dataset.
format Online
Article
Text
id pubmed-8704614
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-87046142021-12-25 BiLSTM-5mC: A Bidirectional Long Short-Term Memory-Based Approach for Predicting 5-Methylcytosine Sites in Genome-Wide DNA Promoters Cheng, Xin Wang, Jun Li, Qianyue Liu, Taigang Molecules Article An important reason of cancer proliferation is the change in DNA methylation patterns, characterized by the localized hypermethylation of the promoters of tumor-suppressor genes together with an overall decrease in the level of 5-methylcytosine (5mC). Therefore, identifying the 5mC sites in the promoters is a critical step towards further understanding the diverse functions of DNA methylation in genetic diseases such as cancers and aging. However, most wet-lab experimental techniques are often time consuming and laborious for detecting 5mC sites. In this study, we proposed a deep learning-based approach, called BiLSTM-5mC, for accurately identifying 5mC sites in genome-wide DNA promoters. First, we randomly divided the negative samples into 11 subsets of equal size, one of which can form the balance subset by combining with the positive samples in the same amount. Then, two types of feature vectors encoded by the one-hot method, and the nucleotide property and frequency (NPF) methods were fed into a bidirectional long short-term memory (BiLSTM) network and a full connection layer to train the 22 submodels. Finally, the outputs of these models were integrated to predict 5mC sites by using the majority vote strategy. Our experimental results demonstrated that BiLSTM-5mC outperformed existing methods based on the same independent dataset. MDPI 2021-12-07 /pmc/articles/PMC8704614/ /pubmed/34946497 http://dx.doi.org/10.3390/molecules26247414 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Cheng, Xin
Wang, Jun
Li, Qianyue
Liu, Taigang
BiLSTM-5mC: A Bidirectional Long Short-Term Memory-Based Approach for Predicting 5-Methylcytosine Sites in Genome-Wide DNA Promoters
title BiLSTM-5mC: A Bidirectional Long Short-Term Memory-Based Approach for Predicting 5-Methylcytosine Sites in Genome-Wide DNA Promoters
title_full BiLSTM-5mC: A Bidirectional Long Short-Term Memory-Based Approach for Predicting 5-Methylcytosine Sites in Genome-Wide DNA Promoters
title_fullStr BiLSTM-5mC: A Bidirectional Long Short-Term Memory-Based Approach for Predicting 5-Methylcytosine Sites in Genome-Wide DNA Promoters
title_full_unstemmed BiLSTM-5mC: A Bidirectional Long Short-Term Memory-Based Approach for Predicting 5-Methylcytosine Sites in Genome-Wide DNA Promoters
title_short BiLSTM-5mC: A Bidirectional Long Short-Term Memory-Based Approach for Predicting 5-Methylcytosine Sites in Genome-Wide DNA Promoters
title_sort bilstm-5mc: a bidirectional long short-term memory-based approach for predicting 5-methylcytosine sites in genome-wide dna promoters
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8704614/
https://www.ncbi.nlm.nih.gov/pubmed/34946497
http://dx.doi.org/10.3390/molecules26247414
work_keys_str_mv AT chengxin bilstm5mcabidirectionallongshorttermmemorybasedapproachforpredicting5methylcytosinesitesingenomewidednapromoters
AT wangjun bilstm5mcabidirectionallongshorttermmemorybasedapproachforpredicting5methylcytosinesitesingenomewidednapromoters
AT liqianyue bilstm5mcabidirectionallongshorttermmemorybasedapproachforpredicting5methylcytosinesitesingenomewidednapromoters
AT liutaigang bilstm5mcabidirectionallongshorttermmemorybasedapproachforpredicting5methylcytosinesitesingenomewidednapromoters