Cargando…
Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species
N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA’s biological functions. However, the existing experimental techniques for detecting 6mA sites ar...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924747/ https://www.ncbi.nlm.nih.gov/pubmed/33600435 http://dx.doi.org/10.1371/journal.pcbi.1008767 |
_version_ | 1783659154363121664 |
---|---|
author | Li, Zutan Jiang, Hangjin Kong, Lingpeng Chen, Yuanyuan Lang, Kun Fan, Xiaodan Zhang, Liangyun Pian, Cong |
author_facet | Li, Zutan Jiang, Hangjin Kong, Lingpeng Chen, Yuanyuan Lang, Kun Fan, Xiaodan Zhang, Liangyun Pian, Cong |
author_sort | Li, Zutan |
collection | PubMed |
description | N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA’s biological functions. However, the existing experimental techniques for detecting 6mA sites are cost-ineffective, which implies the great need of developing new computational methods for this problem. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca and Rosa chinensis with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression. |
format | Online Article Text |
id | pubmed-7924747 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-79247472021-03-10 Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species Li, Zutan Jiang, Hangjin Kong, Lingpeng Chen, Yuanyuan Lang, Kun Fan, Xiaodan Zhang, Liangyun Pian, Cong PLoS Comput Biol Research Article N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA’s biological functions. However, the existing experimental techniques for detecting 6mA sites are cost-ineffective, which implies the great need of developing new computational methods for this problem. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca and Rosa chinensis with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression. Public Library of Science 2021-02-18 /pmc/articles/PMC7924747/ /pubmed/33600435 http://dx.doi.org/10.1371/journal.pcbi.1008767 Text en © 2021 Li et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Li, Zutan Jiang, Hangjin Kong, Lingpeng Chen, Yuanyuan Lang, Kun Fan, Xiaodan Zhang, Liangyun Pian, Cong Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species |
title | Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species |
title_full | Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species |
title_fullStr | Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species |
title_full_unstemmed | Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species |
title_short | Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species |
title_sort | deep6ma: a deep learning framework for exploring similar patterns in dna n6-methyladenine sites across different species |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924747/ https://www.ncbi.nlm.nih.gov/pubmed/33600435 http://dx.doi.org/10.1371/journal.pcbi.1008767 |
work_keys_str_mv | AT lizutan deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies AT jianghangjin deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies AT konglingpeng deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies AT chenyuanyuan deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies AT langkun deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies AT fanxiaodan deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies AT zhangliangyun deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies AT piancong deep6maadeeplearningframeworkforexploringsimilarpatternsindnan6methyladeninesitesacrossdifferentspecies |