Cargando…

Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species

N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA’s biological functions. However, the existing experimental techniques for detecting 6mA sites ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Zutan, Jiang, Hangjin, Kong, Lingpeng, Chen, Yuanyuan, Lang, Kun, Fan, Xiaodan, Zhang, Liangyun, Pian, Cong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924747/
https://www.ncbi.nlm.nih.gov/pubmed/33600435
http://dx.doi.org/10.1371/journal.pcbi.1008767
_version_ 1783659154363121664
author Li, Zutan
Jiang, Hangjin
Kong, Lingpeng
Chen, Yuanyuan
Lang, Kun
Fan, Xiaodan
Zhang, Liangyun
Pian, Cong
author_facet Li, Zutan
Jiang, Hangjin
Kong, Lingpeng
Chen, Yuanyuan
Lang, Kun
Fan, Xiaodan
Zhang, Liangyun
Pian, Cong
author_sort Li, Zutan
collection PubMed
description N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA’s biological functions. However, the existing experimental techniques for detecting 6mA sites are cost-ineffective, which implies the great need of developing new computational methods for this problem. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca and Rosa chinensis with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression.
format Online
Article
Text
id pubmed-7924747
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-79247472021-03-10 Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species Li, Zutan Jiang, Hangjin Kong, Lingpeng Chen, Yuanyuan Lang, Kun Fan, Xiaodan Zhang, Liangyun Pian, Cong PLoS Comput Biol Research Article N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA’s biological functions. However, the existing experimental techniques for detecting 6mA sites are cost-ineffective, which implies the great need of developing new computational methods for this problem. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca and Rosa chinensis with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression. Public Library of Science 2021-02-18 /pmc/articles/PMC7924747/ /pubmed/33600435 http://dx.doi.org/10.1371/journal.pcbi.1008767 Text en © 2021 Li et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Li, Zutan
Jiang, Hangjin
Kong, Lingpeng
Chen, Yuanyuan
Lang, Kun
Fan, Xiaodan
Zhang, Liangyun
Pian, Cong
Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species
title Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species
title_full Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species
title_fullStr Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species
title_full_unstemmed Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species
title_short Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species
title_sort deep6ma: a deep learning framework for exploring similar patterns in dna n6-methyladenine sites across different species
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924747/
https://www.ncbi.nlm.nih.gov/pubmed/33600435
http://dx.doi.org/10.1371/journal.pcbi.1008767
work_keys_str_mv AT lizutan deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies
AT jianghangjin deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies
AT konglingpeng deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies
AT chenyuanyuan deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies
AT langkun deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies
AT fanxiaodan deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies
AT zhangliangyun deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies
AT piancong deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies