Cargando…

Penalized logistic regression based on [Formula: see text] penalty for high-dimensional DNA methylation data

BACKGROUND: DNA methylation is a molecular modification of DNA that is vital and occurs in gene expression. In cancer tissues, the 5’–C–phosphate–G–3’(CpG) rich regions are abnormally hypermethylated or hypomethylated. Therefore, it is useful to find out the diseased CpG sites by employing specific...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Hong-Kun, Liang, Yong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: IOS Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7369078/
https://www.ncbi.nlm.nih.gov/pubmed/32364148
http://dx.doi.org/10.3233/THC-209016
_version_ 1783560721005543424
author Jiang, Hong-Kun
Liang, Yong
author_facet Jiang, Hong-Kun
Liang, Yong
author_sort Jiang, Hong-Kun
collection PubMed
description BACKGROUND: DNA methylation is a molecular modification of DNA that is vital and occurs in gene expression. In cancer tissues, the 5’–C–phosphate–G–3’(CpG) rich regions are abnormally hypermethylated or hypomethylated. Therefore, it is useful to find out the diseased CpG sites by employing specific methods. CpG sites are highly correlated with each other within the same gene or the same CpG island. OBJECTIVE: Based on this group effect, we proposed an efficient and accurate method for selecting pathogenic CpG sites. METHODS: Our method aimed to combine a [Formula: see text] regularized solver and a central node fully connected network to penalize group constrained logistic regression model. Consequently, both sparsity and group effect were brought in with respect to the correlated regression coefficients. RESULTS: Extensive simulation studies were used to compare our proposed approach with existing mainstream regularization in respect of classification accuracy and stability. The simulation results show that a greater predictive accuracy was attained in comparison to previous methods. Furthermore, our method was applied to over 20000 CpG sites and verified using the ovarian cancer data generated from Illumina Infinium HumanMethylation 27K Beadchip. In the result of the real dataset, not only the indicators of predictive accuracy are higher than the previous methods, but also more CpG sites containing genes are confirmed pathogenic. Additionally, the total number of CpG sites chosen is less than other methods and the results show higher accuracy rates in comparison to other methods in simulation and DNA methylation data. CONCLUSION: The proposed method offers an advanced tool to researchers in DNA methylation and can be a powerful tool for recognizing pathogenic CpG sites.
format Online
Article
Text
id pubmed-7369078
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher IOS Press
record_format MEDLINE/PubMed
spelling pubmed-73690782020-07-22 Penalized logistic regression based on [Formula: see text] penalty for high-dimensional DNA methylation data Jiang, Hong-Kun Liang, Yong Technol Health Care Research Article BACKGROUND: DNA methylation is a molecular modification of DNA that is vital and occurs in gene expression. In cancer tissues, the 5’–C–phosphate–G–3’(CpG) rich regions are abnormally hypermethylated or hypomethylated. Therefore, it is useful to find out the diseased CpG sites by employing specific methods. CpG sites are highly correlated with each other within the same gene or the same CpG island. OBJECTIVE: Based on this group effect, we proposed an efficient and accurate method for selecting pathogenic CpG sites. METHODS: Our method aimed to combine a [Formula: see text] regularized solver and a central node fully connected network to penalize group constrained logistic regression model. Consequently, both sparsity and group effect were brought in with respect to the correlated regression coefficients. RESULTS: Extensive simulation studies were used to compare our proposed approach with existing mainstream regularization in respect of classification accuracy and stability. The simulation results show that a greater predictive accuracy was attained in comparison to previous methods. Furthermore, our method was applied to over 20000 CpG sites and verified using the ovarian cancer data generated from Illumina Infinium HumanMethylation 27K Beadchip. In the result of the real dataset, not only the indicators of predictive accuracy are higher than the previous methods, but also more CpG sites containing genes are confirmed pathogenic. Additionally, the total number of CpG sites chosen is less than other methods and the results show higher accuracy rates in comparison to other methods in simulation and DNA methylation data. CONCLUSION: The proposed method offers an advanced tool to researchers in DNA methylation and can be a powerful tool for recognizing pathogenic CpG sites. IOS Press 2020-06-04 /pmc/articles/PMC7369078/ /pubmed/32364148 http://dx.doi.org/10.3233/THC-209016 Text en © 2020 – IOS Press and the authors. All rights reserved https://creativecommons.org/licenses/by-nc/4.0/ This article is published online with Open Access and distributed under the terms of the Creative Commons Attribution Non-Commercial License (CC BY-NC 4.0).
spellingShingle Research Article
Jiang, Hong-Kun
Liang, Yong
Penalized logistic regression based on [Formula: see text] penalty for high-dimensional DNA methylation data
title Penalized logistic regression based on [Formula: see text] penalty for high-dimensional DNA methylation data
title_full Penalized logistic regression based on [Formula: see text] penalty for high-dimensional DNA methylation data
title_fullStr Penalized logistic regression based on [Formula: see text] penalty for high-dimensional DNA methylation data
title_full_unstemmed Penalized logistic regression based on [Formula: see text] penalty for high-dimensional DNA methylation data
title_short Penalized logistic regression based on [Formula: see text] penalty for high-dimensional DNA methylation data
title_sort penalized logistic regression based on [formula: see text] penalty for high-dimensional dna methylation data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7369078/
https://www.ncbi.nlm.nih.gov/pubmed/32364148
http://dx.doi.org/10.3233/THC-209016
work_keys_str_mv AT jianghongkun penalizedlogisticregressionbasedonformulaseetextpenaltyforhighdimensionaldnamethylationdata
AT liangyong penalizedlogisticregressionbasedonformulaseetextpenaltyforhighdimensionaldnamethylationdata