Cargando…
iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data
MOTIVATION: The investigation of DNA methylation can shed light on the processes underlying human well-being and help determine overall human health. However, insufficient coverage makes it challenging to implement single-stranded DNA methylation sequencing technologies, highlighting the need for an...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10444964/ https://www.ncbi.nlm.nih.gov/pubmed/37555812 http://dx.doi.org/10.1093/bioinformatics/btad474 |
_version_ | 1785094070159278080 |
---|---|
author | Park, Sehi Rehman, Mobeen Ur Ullah, Farman Tayara, Hilal Chong, Kil To |
author_facet | Park, Sehi Rehman, Mobeen Ur Ullah, Farman Tayara, Hilal Chong, Kil To |
author_sort | Park, Sehi |
collection | PubMed |
description | MOTIVATION: The investigation of DNA methylation can shed light on the processes underlying human well-being and help determine overall human health. However, insufficient coverage makes it challenging to implement single-stranded DNA methylation sequencing technologies, highlighting the need for an efficient prediction model. Models are required to create an understanding of the underlying biological systems and to project single-cell (methylated) data accurately. RESULTS: In this study, we developed positional features for predicting CpG sites. Positional characteristics of the sequence are derived using data from CpG regions and the separation between nearby CpG sites. Multiple optimized classifiers and different ensemble learning approaches are evaluated. The OPTUNA framework is used to optimize the algorithms. The CatBoost algorithm followed by the stacking algorithm outperformed existing DNA methylation identifiers. AVAILABILITY AND IMPLEMENTATION: The data and methodologies used in this study are openly accessible to the research community. Researchers can access the positional features and algorithms used for predicting CpG site methylation patterns. To achieve superior performance, we employed the CatBoost algorithm followed by the stacking algorithm, which outperformed existing DNA methylation identifiers. The proposed iCpG-Pos approach utilizes only positional features, resulting in a substantial reduction in computational complexity compared to other known approaches for detecting CpG site methylation patterns. In conclusion, our study introduces a novel approach, iCpG-Pos, for predicting CpG site methylation patterns. By focusing on positional features, our model offers both accuracy and efficiency, making it a promising tool for advancing DNA methylation research and its applications in human health and well-being. |
format | Online Article Text |
id | pubmed-10444964 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-104449642023-08-24 iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data Park, Sehi Rehman, Mobeen Ur Ullah, Farman Tayara, Hilal Chong, Kil To Bioinformatics Original Paper MOTIVATION: The investigation of DNA methylation can shed light on the processes underlying human well-being and help determine overall human health. However, insufficient coverage makes it challenging to implement single-stranded DNA methylation sequencing technologies, highlighting the need for an efficient prediction model. Models are required to create an understanding of the underlying biological systems and to project single-cell (methylated) data accurately. RESULTS: In this study, we developed positional features for predicting CpG sites. Positional characteristics of the sequence are derived using data from CpG regions and the separation between nearby CpG sites. Multiple optimized classifiers and different ensemble learning approaches are evaluated. The OPTUNA framework is used to optimize the algorithms. The CatBoost algorithm followed by the stacking algorithm outperformed existing DNA methylation identifiers. AVAILABILITY AND IMPLEMENTATION: The data and methodologies used in this study are openly accessible to the research community. Researchers can access the positional features and algorithms used for predicting CpG site methylation patterns. To achieve superior performance, we employed the CatBoost algorithm followed by the stacking algorithm, which outperformed existing DNA methylation identifiers. The proposed iCpG-Pos approach utilizes only positional features, resulting in a substantial reduction in computational complexity compared to other known approaches for detecting CpG site methylation patterns. In conclusion, our study introduces a novel approach, iCpG-Pos, for predicting CpG site methylation patterns. By focusing on positional features, our model offers both accuracy and efficiency, making it a promising tool for advancing DNA methylation research and its applications in human health and well-being. Oxford University Press 2023-08-09 /pmc/articles/PMC10444964/ /pubmed/37555812 http://dx.doi.org/10.1093/bioinformatics/btad474 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Park, Sehi Rehman, Mobeen Ur Ullah, Farman Tayara, Hilal Chong, Kil To iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data |
title | iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data |
title_full | iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data |
title_fullStr | iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data |
title_full_unstemmed | iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data |
title_short | iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data |
title_sort | icpg-pos: an accurate computational approach for identification of cpg sites using positional features on single-cell whole genome sequence data |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10444964/ https://www.ncbi.nlm.nih.gov/pubmed/37555812 http://dx.doi.org/10.1093/bioinformatics/btad474 |
work_keys_str_mv | AT parksehi icpgposanaccuratecomputationalapproachforidentificationofcpgsitesusingpositionalfeaturesonsinglecellwholegenomesequencedata AT rehmanmobeenur icpgposanaccuratecomputationalapproachforidentificationofcpgsitesusingpositionalfeaturesonsinglecellwholegenomesequencedata AT ullahfarman icpgposanaccuratecomputationalapproachforidentificationofcpgsitesusingpositionalfeaturesonsinglecellwholegenomesequencedata AT tayarahilal icpgposanaccuratecomputationalapproachforidentificationofcpgsitesusingpositionalfeaturesonsinglecellwholegenomesequencedata AT chongkilto icpgposanaccuratecomputationalapproachforidentificationofcpgsitesusingpositionalfeaturesonsinglecellwholegenomesequencedata |