Cargando…

Penalized logistic regression based on [Formula: see text] penalty for high-dimensional DNA methylation data

BACKGROUND: DNA methylation is a molecular modification of DNA that is vital and occurs in gene expression. In cancer tissues, the 5’–C–phosphate–G–3’(CpG) rich regions are abnormally hypermethylated or hypomethylated. Therefore, it is useful to find out the diseased CpG sites by employing specific...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Hong-Kun, Liang, Yong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: IOS Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7369078/
https://www.ncbi.nlm.nih.gov/pubmed/32364148
http://dx.doi.org/10.3233/THC-209016
Descripción
Sumario:BACKGROUND: DNA methylation is a molecular modification of DNA that is vital and occurs in gene expression. In cancer tissues, the 5’–C–phosphate–G–3’(CpG) rich regions are abnormally hypermethylated or hypomethylated. Therefore, it is useful to find out the diseased CpG sites by employing specific methods. CpG sites are highly correlated with each other within the same gene or the same CpG island. OBJECTIVE: Based on this group effect, we proposed an efficient and accurate method for selecting pathogenic CpG sites. METHODS: Our method aimed to combine a [Formula: see text] regularized solver and a central node fully connected network to penalize group constrained logistic regression model. Consequently, both sparsity and group effect were brought in with respect to the correlated regression coefficients. RESULTS: Extensive simulation studies were used to compare our proposed approach with existing mainstream regularization in respect of classification accuracy and stability. The simulation results show that a greater predictive accuracy was attained in comparison to previous methods. Furthermore, our method was applied to over 20000 CpG sites and verified using the ovarian cancer data generated from Illumina Infinium HumanMethylation 27K Beadchip. In the result of the real dataset, not only the indicators of predictive accuracy are higher than the previous methods, but also more CpG sites containing genes are confirmed pathogenic. Additionally, the total number of CpG sites chosen is less than other methods and the results show higher accuracy rates in comparison to other methods in simulation and DNA methylation data. CONCLUSION: The proposed method offers an advanced tool to researchers in DNA methylation and can be a powerful tool for recognizing pathogenic CpG sites.