Cargando…

iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data

MOTIVATION: The investigation of DNA methylation can shed light on the processes underlying human well-being and help determine overall human health. However, insufficient coverage makes it challenging to implement single-stranded DNA methylation sequencing technologies, highlighting the need for an...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, Sehi, Rehman, Mobeen Ur, Ullah, Farman, Tayara, Hilal, Chong, Kil To
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10444964/
https://www.ncbi.nlm.nih.gov/pubmed/37555812
http://dx.doi.org/10.1093/bioinformatics/btad474
_version_ 1785094070159278080
author Park, Sehi
Rehman, Mobeen Ur
Ullah, Farman
Tayara, Hilal
Chong, Kil To
author_facet Park, Sehi
Rehman, Mobeen Ur
Ullah, Farman
Tayara, Hilal
Chong, Kil To
author_sort Park, Sehi
collection PubMed
description MOTIVATION: The investigation of DNA methylation can shed light on the processes underlying human well-being and help determine overall human health. However, insufficient coverage makes it challenging to implement single-stranded DNA methylation sequencing technologies, highlighting the need for an efficient prediction model. Models are required to create an understanding of the underlying biological systems and to project single-cell (methylated) data accurately. RESULTS: In this study, we developed positional features for predicting CpG sites. Positional characteristics of the sequence are derived using data from CpG regions and the separation between nearby CpG sites. Multiple optimized classifiers and different ensemble learning approaches are evaluated. The OPTUNA framework is used to optimize the algorithms. The CatBoost algorithm followed by the stacking algorithm outperformed existing DNA methylation identifiers. AVAILABILITY AND IMPLEMENTATION: The data and methodologies used in this study are openly accessible to the research community. Researchers can access the positional features and algorithms used for predicting CpG site methylation patterns. To achieve superior performance, we employed the CatBoost algorithm followed by the stacking algorithm, which outperformed existing DNA methylation identifiers. The proposed iCpG-Pos approach utilizes only positional features, resulting in a substantial reduction in computational complexity compared to other known approaches for detecting CpG site methylation patterns. In conclusion, our study introduces a novel approach, iCpG-Pos, for predicting CpG site methylation patterns. By focusing on positional features, our model offers both accuracy and efficiency, making it a promising tool for advancing DNA methylation research and its applications in human health and well-being.
format Online
Article
Text
id pubmed-10444964
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104449642023-08-24 iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data Park, Sehi Rehman, Mobeen Ur Ullah, Farman Tayara, Hilal Chong, Kil To Bioinformatics Original Paper MOTIVATION: The investigation of DNA methylation can shed light on the processes underlying human well-being and help determine overall human health. However, insufficient coverage makes it challenging to implement single-stranded DNA methylation sequencing technologies, highlighting the need for an efficient prediction model. Models are required to create an understanding of the underlying biological systems and to project single-cell (methylated) data accurately. RESULTS: In this study, we developed positional features for predicting CpG sites. Positional characteristics of the sequence are derived using data from CpG regions and the separation between nearby CpG sites. Multiple optimized classifiers and different ensemble learning approaches are evaluated. The OPTUNA framework is used to optimize the algorithms. The CatBoost algorithm followed by the stacking algorithm outperformed existing DNA methylation identifiers. AVAILABILITY AND IMPLEMENTATION: The data and methodologies used in this study are openly accessible to the research community. Researchers can access the positional features and algorithms used for predicting CpG site methylation patterns. To achieve superior performance, we employed the CatBoost algorithm followed by the stacking algorithm, which outperformed existing DNA methylation identifiers. The proposed iCpG-Pos approach utilizes only positional features, resulting in a substantial reduction in computational complexity compared to other known approaches for detecting CpG site methylation patterns. In conclusion, our study introduces a novel approach, iCpG-Pos, for predicting CpG site methylation patterns. By focusing on positional features, our model offers both accuracy and efficiency, making it a promising tool for advancing DNA methylation research and its applications in human health and well-being. Oxford University Press 2023-08-09 /pmc/articles/PMC10444964/ /pubmed/37555812 http://dx.doi.org/10.1093/bioinformatics/btad474 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Park, Sehi
Rehman, Mobeen Ur
Ullah, Farman
Tayara, Hilal
Chong, Kil To
iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data
title iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data
title_full iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data
title_fullStr iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data
title_full_unstemmed iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data
title_short iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data
title_sort icpg-pos: an accurate computational approach for identification of cpg sites using positional features on single-cell whole genome sequence data
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10444964/
https://www.ncbi.nlm.nih.gov/pubmed/37555812
http://dx.doi.org/10.1093/bioinformatics/btad474
work_keys_str_mv AT parksehi icpgposanaccuratecomputationalapproachforidentificationofcpgsitesusingpositionalfeaturesonsinglecellwholegenomesequencedata
AT rehmanmobeenur icpgposanaccuratecomputationalapproachforidentificationofcpgsitesusingpositionalfeaturesonsinglecellwholegenomesequencedata
AT ullahfarman icpgposanaccuratecomputationalapproachforidentificationofcpgsitesusingpositionalfeaturesonsinglecellwholegenomesequencedata
AT tayarahilal icpgposanaccuratecomputationalapproachforidentificationofcpgsitesusingpositionalfeaturesonsinglecellwholegenomesequencedata
AT chongkilto icpgposanaccuratecomputationalapproachforidentificationofcpgsitesusingpositionalfeaturesonsinglecellwholegenomesequencedata