Cargando…
Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data
BACKGROUND: In human genetic association studies with high-dimensional gene expression data, it has been well known that statistical selection methods utilizing prior biological network knowledge such as genetic pathways and signaling pathways can outperform other methods that ignore genetic network...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805595/ https://www.ncbi.nlm.nih.gov/pubmed/31640538 http://dx.doi.org/10.1186/s12859-019-3040-x |
Sumario: | BACKGROUND: In human genetic association studies with high-dimensional gene expression data, it has been well known that statistical selection methods utilizing prior biological network knowledge such as genetic pathways and signaling pathways can outperform other methods that ignore genetic network structures in terms of true positive selection. In recent epigenetic research on case-control association studies, relatively many statistical methods have been proposed to identify cancer-related CpG sites and their corresponding genes from high-dimensional DNA methylation array data. However, most of existing methods are not designed to utilize genetic network information although methylation levels between linked genes in the genetic networks tend to be highly correlated with each other. RESULTS: We propose new approach that combines data dimension reduction techniques with network-based regularization to identify outcome-related genes for analysis of high-dimensional DNA methylation data. In simulation studies, we demonstrated that the proposed approach overwhelms other statistical methods that do not utilize genetic network information in terms of true positive selection. We also applied it to the 450K DNA methylation array data of the four breast invasive carcinoma cancer subtypes from The Cancer Genome Atlas (TCGA) project. CONCLUSIONS: The proposed variable selection approach can utilize prior biological network information for analysis of high-dimensional DNA methylation array data. It first captures gene level signals from multiple CpG sites using data a dimension reduction technique and then performs network-based regularization based on biological network graph information. It can select potentially cancer-related genes and genetic pathways that were missed by the existing methods. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3040-x) contains supplementary material, which is available to authorized users. |
---|