Cargando…

i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes

N4-methylcytosine (4mC) is one of the most important DNA modifications and involved in regulating cell differentiations and gene expressions. The accurate identification of 4mC sites is necessary to understand various biological functions. In this work, we developed a new computational predictor cal...

Descripción completa

Detalles Bibliográficos
Autores principales: Hasan, Md. Mehedi, Manavalan, Balachandran, Shoombuatong, Watshara, Khatun, Mst. Shamima, Kurata, Hiroyuki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7168350/
https://www.ncbi.nlm.nih.gov/pubmed/32322372
http://dx.doi.org/10.1016/j.csbj.2020.04.001
_version_ 1783523668083605504
author Hasan, Md. Mehedi
Manavalan, Balachandran
Shoombuatong, Watshara
Khatun, Mst. Shamima
Kurata, Hiroyuki
author_facet Hasan, Md. Mehedi
Manavalan, Balachandran
Shoombuatong, Watshara
Khatun, Mst. Shamima
Kurata, Hiroyuki
author_sort Hasan, Md. Mehedi
collection PubMed
description N4-methylcytosine (4mC) is one of the most important DNA modifications and involved in regulating cell differentiations and gene expressions. The accurate identification of 4mC sites is necessary to understand various biological functions. In this work, we developed a new computational predictor called i4mC-Mouse to identify 4mC sites in the mouse genome. Herein, six encoding schemes of k-space nucleotide composition (KSNC), k-mer nucleotide composition (Kmer), mono nucleotide binary encoding (MBE), dinucleotide binary encoding, electron–ion interaction pseudo potentials (EIIP) and dinucleotide physicochemical composition were explored that cover different characteristics of DNA sequence information. Subsequently, we built six RF-based encoding models and then linearly combined their probability scores to construct the final predictor. Among the six RF-based models, the Kmer, KSNC, MBE, and EIIP encodings are sufficient, which contributed to 10%, 45%, 25%, and 20% of the prediction performance, respectively. On the independent test the i4mC-Mouse predicted the 4mC sites with accuracy and MCC of 0.816 and 0.633, respectively, which were approximately 2.5% and 5% higher than those of the existing method (4mCpred-EL). For experimental biologists, a freely available web application was implemented at http://kurata14.bio.kyutech.ac.jp/i4mC-Mouse/.
format Online
Article
Text
id pubmed-7168350
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-71683502020-04-22 i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes Hasan, Md. Mehedi Manavalan, Balachandran Shoombuatong, Watshara Khatun, Mst. Shamima Kurata, Hiroyuki Comput Struct Biotechnol J Research Article N4-methylcytosine (4mC) is one of the most important DNA modifications and involved in regulating cell differentiations and gene expressions. The accurate identification of 4mC sites is necessary to understand various biological functions. In this work, we developed a new computational predictor called i4mC-Mouse to identify 4mC sites in the mouse genome. Herein, six encoding schemes of k-space nucleotide composition (KSNC), k-mer nucleotide composition (Kmer), mono nucleotide binary encoding (MBE), dinucleotide binary encoding, electron–ion interaction pseudo potentials (EIIP) and dinucleotide physicochemical composition were explored that cover different characteristics of DNA sequence information. Subsequently, we built six RF-based encoding models and then linearly combined their probability scores to construct the final predictor. Among the six RF-based models, the Kmer, KSNC, MBE, and EIIP encodings are sufficient, which contributed to 10%, 45%, 25%, and 20% of the prediction performance, respectively. On the independent test the i4mC-Mouse predicted the 4mC sites with accuracy and MCC of 0.816 and 0.633, respectively, which were approximately 2.5% and 5% higher than those of the existing method (4mCpred-EL). For experimental biologists, a freely available web application was implemented at http://kurata14.bio.kyutech.ac.jp/i4mC-Mouse/. Research Network of Computational and Structural Biotechnology 2020-04-08 /pmc/articles/PMC7168350/ /pubmed/32322372 http://dx.doi.org/10.1016/j.csbj.2020.04.001 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Hasan, Md. Mehedi
Manavalan, Balachandran
Shoombuatong, Watshara
Khatun, Mst. Shamima
Kurata, Hiroyuki
i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes
title i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes
title_full i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes
title_fullStr i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes
title_full_unstemmed i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes
title_short i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes
title_sort i4mc-mouse: improved identification of dna n4-methylcytosine sites in the mouse genome using multiple encoding schemes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7168350/
https://www.ncbi.nlm.nih.gov/pubmed/32322372
http://dx.doi.org/10.1016/j.csbj.2020.04.001
work_keys_str_mv AT hasanmdmehedi i4mcmouseimprovedidentificationofdnan4methylcytosinesitesinthemousegenomeusingmultipleencodingschemes
AT manavalanbalachandran i4mcmouseimprovedidentificationofdnan4methylcytosinesitesinthemousegenomeusingmultipleencodingschemes
AT shoombuatongwatshara i4mcmouseimprovedidentificationofdnan4methylcytosinesitesinthemousegenomeusingmultipleencodingschemes
AT khatunmstshamima i4mcmouseimprovedidentificationofdnan4methylcytosinesitesinthemousegenomeusingmultipleencodingschemes
AT kuratahiroyuki i4mcmouseimprovedidentificationofdnan4methylcytosinesitesinthemousegenomeusingmultipleencodingschemes