Cargando…

i6mA-DNCP: Computational Identification of DNA N(6)-Methyladenine Sites in the Rice Genome Using Optimized Dinucleotide-Based Features

DNA N(6)-methyladenine (6mA) plays an important role in regulating the gene expression of eukaryotes. Accurate identification of 6mA sites may assist in understanding genomic 6mA distributions and biological functions. Various experimental methods have been applied to detect 6mA sites in a genome-wi...

Descripción completa

Detalles Bibliográficos
Autores principales: Kong, Liang, Zhang, Lichao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6826501/
https://www.ncbi.nlm.nih.gov/pubmed/31635172
http://dx.doi.org/10.3390/genes10100828
_version_ 1783465103677456384
author Kong, Liang
Zhang, Lichao
author_facet Kong, Liang
Zhang, Lichao
author_sort Kong, Liang
collection PubMed
description DNA N(6)-methyladenine (6mA) plays an important role in regulating the gene expression of eukaryotes. Accurate identification of 6mA sites may assist in understanding genomic 6mA distributions and biological functions. Various experimental methods have been applied to detect 6mA sites in a genome-wide scope, but they are too time-consuming and expensive. Developing computational methods to rapidly identify 6mA sites is needed. In this paper, a new machine learning-based method, i6mA-DNCP, was proposed for identifying 6mA sites in the rice genome. Dinucleotide composition and dinucleotide-based DNA properties were first employed to represent DNA sequences. After a specially designed DNA property selection process, a bagging classifier was used to build the prediction model. The jackknife test on a benchmark dataset demonstrated that i6mA-DNCP could obtain 84.43% sensitivity, 88.86% specificity, 86.65% accuracy, a 0.734 Matthew’s correlation coefficient (MCC), and a 0.926 area under the receiver operating characteristic curve (AUC). Moreover, three independent datasets were established to assess the generalization ability of our method. Extensive experiments validated the effectiveness of i6mA-DNCP.
format Online
Article
Text
id pubmed-6826501
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-68265012019-11-18 i6mA-DNCP: Computational Identification of DNA N(6)-Methyladenine Sites in the Rice Genome Using Optimized Dinucleotide-Based Features Kong, Liang Zhang, Lichao Genes (Basel) Article DNA N(6)-methyladenine (6mA) plays an important role in regulating the gene expression of eukaryotes. Accurate identification of 6mA sites may assist in understanding genomic 6mA distributions and biological functions. Various experimental methods have been applied to detect 6mA sites in a genome-wide scope, but they are too time-consuming and expensive. Developing computational methods to rapidly identify 6mA sites is needed. In this paper, a new machine learning-based method, i6mA-DNCP, was proposed for identifying 6mA sites in the rice genome. Dinucleotide composition and dinucleotide-based DNA properties were first employed to represent DNA sequences. After a specially designed DNA property selection process, a bagging classifier was used to build the prediction model. The jackknife test on a benchmark dataset demonstrated that i6mA-DNCP could obtain 84.43% sensitivity, 88.86% specificity, 86.65% accuracy, a 0.734 Matthew’s correlation coefficient (MCC), and a 0.926 area under the receiver operating characteristic curve (AUC). Moreover, three independent datasets were established to assess the generalization ability of our method. Extensive experiments validated the effectiveness of i6mA-DNCP. MDPI 2019-10-20 /pmc/articles/PMC6826501/ /pubmed/31635172 http://dx.doi.org/10.3390/genes10100828 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kong, Liang
Zhang, Lichao
i6mA-DNCP: Computational Identification of DNA N(6)-Methyladenine Sites in the Rice Genome Using Optimized Dinucleotide-Based Features
title i6mA-DNCP: Computational Identification of DNA N(6)-Methyladenine Sites in the Rice Genome Using Optimized Dinucleotide-Based Features
title_full i6mA-DNCP: Computational Identification of DNA N(6)-Methyladenine Sites in the Rice Genome Using Optimized Dinucleotide-Based Features
title_fullStr i6mA-DNCP: Computational Identification of DNA N(6)-Methyladenine Sites in the Rice Genome Using Optimized Dinucleotide-Based Features
title_full_unstemmed i6mA-DNCP: Computational Identification of DNA N(6)-Methyladenine Sites in the Rice Genome Using Optimized Dinucleotide-Based Features
title_short i6mA-DNCP: Computational Identification of DNA N(6)-Methyladenine Sites in the Rice Genome Using Optimized Dinucleotide-Based Features
title_sort i6ma-dncp: computational identification of dna n(6)-methyladenine sites in the rice genome using optimized dinucleotide-based features
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6826501/
https://www.ncbi.nlm.nih.gov/pubmed/31635172
http://dx.doi.org/10.3390/genes10100828
work_keys_str_mv AT kongliang i6madncpcomputationalidentificationofdnan6methyladeninesitesinthericegenomeusingoptimizeddinucleotidebasedfeatures
AT zhanglichao i6madncpcomputationalidentificationofdnan6methyladeninesitesinthericegenomeusingoptimizeddinucleotidebasedfeatures