Cargando…

Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data

BACKGROUND: In human genetic association studies with high-dimensional gene expression data, it has been well known that statistical selection methods utilizing prior biological network knowledge such as genetic pathways and signaling pathways can outperform other methods that ignore genetic network...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Kipoong, Sun, Hokeun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805595/
https://www.ncbi.nlm.nih.gov/pubmed/31640538
http://dx.doi.org/10.1186/s12859-019-3040-x
_version_ 1783461425833836544
author Kim, Kipoong
Sun, Hokeun
author_facet Kim, Kipoong
Sun, Hokeun
author_sort Kim, Kipoong
collection PubMed
description BACKGROUND: In human genetic association studies with high-dimensional gene expression data, it has been well known that statistical selection methods utilizing prior biological network knowledge such as genetic pathways and signaling pathways can outperform other methods that ignore genetic network structures in terms of true positive selection. In recent epigenetic research on case-control association studies, relatively many statistical methods have been proposed to identify cancer-related CpG sites and their corresponding genes from high-dimensional DNA methylation array data. However, most of existing methods are not designed to utilize genetic network information although methylation levels between linked genes in the genetic networks tend to be highly correlated with each other. RESULTS: We propose new approach that combines data dimension reduction techniques with network-based regularization to identify outcome-related genes for analysis of high-dimensional DNA methylation data. In simulation studies, we demonstrated that the proposed approach overwhelms other statistical methods that do not utilize genetic network information in terms of true positive selection. We also applied it to the 450K DNA methylation array data of the four breast invasive carcinoma cancer subtypes from The Cancer Genome Atlas (TCGA) project. CONCLUSIONS: The proposed variable selection approach can utilize prior biological network information for analysis of high-dimensional DNA methylation array data. It first captures gene level signals from multiple CpG sites using data a dimension reduction technique and then performs network-based regularization based on biological network graph information. It can select potentially cancer-related genes and genetic pathways that were missed by the existing methods. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3040-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6805595
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68055952019-10-24 Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data Kim, Kipoong Sun, Hokeun BMC Bioinformatics Methodology Article BACKGROUND: In human genetic association studies with high-dimensional gene expression data, it has been well known that statistical selection methods utilizing prior biological network knowledge such as genetic pathways and signaling pathways can outperform other methods that ignore genetic network structures in terms of true positive selection. In recent epigenetic research on case-control association studies, relatively many statistical methods have been proposed to identify cancer-related CpG sites and their corresponding genes from high-dimensional DNA methylation array data. However, most of existing methods are not designed to utilize genetic network information although methylation levels between linked genes in the genetic networks tend to be highly correlated with each other. RESULTS: We propose new approach that combines data dimension reduction techniques with network-based regularization to identify outcome-related genes for analysis of high-dimensional DNA methylation data. In simulation studies, we demonstrated that the proposed approach overwhelms other statistical methods that do not utilize genetic network information in terms of true positive selection. We also applied it to the 450K DNA methylation array data of the four breast invasive carcinoma cancer subtypes from The Cancer Genome Atlas (TCGA) project. CONCLUSIONS: The proposed variable selection approach can utilize prior biological network information for analysis of high-dimensional DNA methylation array data. It first captures gene level signals from multiple CpG sites using data a dimension reduction technique and then performs network-based regularization based on biological network graph information. It can select potentially cancer-related genes and genetic pathways that were missed by the existing methods. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3040-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-10-22 /pmc/articles/PMC6805595/ /pubmed/31640538 http://dx.doi.org/10.1186/s12859-019-3040-x Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Kim, Kipoong
Sun, Hokeun
Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data
title Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data
title_full Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data
title_fullStr Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data
title_full_unstemmed Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data
title_short Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data
title_sort incorporating genetic networks into case-control association studies with high-dimensional dna methylation data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805595/
https://www.ncbi.nlm.nih.gov/pubmed/31640538
http://dx.doi.org/10.1186/s12859-019-3040-x
work_keys_str_mv AT kimkipoong incorporatinggeneticnetworksintocasecontrolassociationstudieswithhighdimensionaldnamethylationdata
AT sunhokeun incorporatinggeneticnetworksintocasecontrolassociationstudieswithhighdimensionaldnamethylationdata