Cargando…

Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation

DNA N4-methylcytosine (4mC) is an important genetic modification and plays crucial roles in differentiation between self and non-self DNA and in controlling DNA replication, cell cycle, and gene-expression levels. Accurate 4mC site identification is fundamental to improve the understanding of 4mC bi...

Descripción completa

Detalles Bibliográficos
Autores principales: Manavalan, Balachandran, Basith, Shaherin, Shin, Tae Hwan, Wei, Leyi, Lee, Gwang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society of Gene & Cell Therapy 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6540332/
https://www.ncbi.nlm.nih.gov/pubmed/31146255
http://dx.doi.org/10.1016/j.omtn.2019.04.019
_version_ 1783422591956942848
author Manavalan, Balachandran
Basith, Shaherin
Shin, Tae Hwan
Wei, Leyi
Lee, Gwang
author_facet Manavalan, Balachandran
Basith, Shaherin
Shin, Tae Hwan
Wei, Leyi
Lee, Gwang
author_sort Manavalan, Balachandran
collection PubMed
description DNA N4-methylcytosine (4mC) is an important genetic modification and plays crucial roles in differentiation between self and non-self DNA and in controlling DNA replication, cell cycle, and gene-expression levels. Accurate 4mC site identification is fundamental to improve the understanding of 4mC biological functions and mechanisms. Hence, it is necessary to develop in silico approaches for efficient and high-throughput 4mC site identification. Although some bioinformatic tools have been developed in this regard, their prediction accuracy and generalizability require improvement to optimize their usability in practical applications. For this purpose, we here proposed Meta-4mCpred, a meta-predictor for 4mC site prediction. In Meta-4mCpred, we employed a feature representation learning scheme and generated 56 probabilistic features based on four different machine-learning algorithms and seven feature encodings covering diverse sequence information, including compositional, physicochemical, and position-specific information. Subsequently, the probabilistic features were used as an input to support vector machine and developed a final meta-predictor. To the best of our knowledge, this is the first meta-predictor for 4mC site prediction. Cross-validation results show that Meta-4mCpred achieved an overall average accuracy of 84.2% from six different species, which is ∼2%–4% higher than those attainable using the state-of-the-art predictors. Furthermore, Meta-4mCpred achieved an overall average accuracy of 86% on independent datasets evaluation, which is over 4% higher than those yielded by the state-of-the-art predictors. The user-friendly webserver employed to implement the proposed Meta-4mCpred is freely accessible at http://thegleelab.org/Meta-4mCpred.
format Online
Article
Text
id pubmed-6540332
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher American Society of Gene & Cell Therapy
record_format MEDLINE/PubMed
spelling pubmed-65403322019-06-03 Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation Manavalan, Balachandran Basith, Shaherin Shin, Tae Hwan Wei, Leyi Lee, Gwang Mol Ther Nucleic Acids Article DNA N4-methylcytosine (4mC) is an important genetic modification and plays crucial roles in differentiation between self and non-self DNA and in controlling DNA replication, cell cycle, and gene-expression levels. Accurate 4mC site identification is fundamental to improve the understanding of 4mC biological functions and mechanisms. Hence, it is necessary to develop in silico approaches for efficient and high-throughput 4mC site identification. Although some bioinformatic tools have been developed in this regard, their prediction accuracy and generalizability require improvement to optimize their usability in practical applications. For this purpose, we here proposed Meta-4mCpred, a meta-predictor for 4mC site prediction. In Meta-4mCpred, we employed a feature representation learning scheme and generated 56 probabilistic features based on four different machine-learning algorithms and seven feature encodings covering diverse sequence information, including compositional, physicochemical, and position-specific information. Subsequently, the probabilistic features were used as an input to support vector machine and developed a final meta-predictor. To the best of our knowledge, this is the first meta-predictor for 4mC site prediction. Cross-validation results show that Meta-4mCpred achieved an overall average accuracy of 84.2% from six different species, which is ∼2%–4% higher than those attainable using the state-of-the-art predictors. Furthermore, Meta-4mCpred achieved an overall average accuracy of 86% on independent datasets evaluation, which is over 4% higher than those yielded by the state-of-the-art predictors. The user-friendly webserver employed to implement the proposed Meta-4mCpred is freely accessible at http://thegleelab.org/Meta-4mCpred. American Society of Gene & Cell Therapy 2019-04-30 /pmc/articles/PMC6540332/ /pubmed/31146255 http://dx.doi.org/10.1016/j.omtn.2019.04.019 Text en © 2019 The Author(s) http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Manavalan, Balachandran
Basith, Shaherin
Shin, Tae Hwan
Wei, Leyi
Lee, Gwang
Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
title Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
title_full Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
title_fullStr Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
title_full_unstemmed Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
title_short Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
title_sort meta-4mcpred: a sequence-based meta-predictor for accurate dna 4mc site prediction using effective feature representation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6540332/
https://www.ncbi.nlm.nih.gov/pubmed/31146255
http://dx.doi.org/10.1016/j.omtn.2019.04.019
work_keys_str_mv AT manavalanbalachandran meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation
AT basithshaherin meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation
AT shintaehwan meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation
AT weileyi meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation
AT leegwang meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation