Cargando…

Predicting gene expression using DNA methylation in three human populations

BACKGROUND: DNA methylation, an important epigenetic mark, is well known for its regulatory role in gene expression, especially the negative correlation in the promoter region. However, its correlation with gene expression across genome at human population level has not been well studied. In particu...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhong, Huan, Kim, Soyeon, Zhi, Degui, Cui, Xiangqin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6500370/
https://www.ncbi.nlm.nih.gov/pubmed/31106051
http://dx.doi.org/10.7717/peerj.6757
_version_ 1783415941283971072
author Zhong, Huan
Kim, Soyeon
Zhi, Degui
Cui, Xiangqin
author_facet Zhong, Huan
Kim, Soyeon
Zhi, Degui
Cui, Xiangqin
author_sort Zhong, Huan
collection PubMed
description BACKGROUND: DNA methylation, an important epigenetic mark, is well known for its regulatory role in gene expression, especially the negative correlation in the promoter region. However, its correlation with gene expression across genome at human population level has not been well studied. In particular, it is unclear if genome-wide DNA methylation profile of an individual can predict her/his gene expression profile. Previous studies were mostly limited to association analyses between single CpG site methylation and gene expression. It is not known whether DNA methylation of a gene has enough prediction power to serve as a surrogate for gene expression in existing human study cohorts with DNA samples other than RNA samples. RESULTS: We examined DNA methylation in the gene region for predicting gene expression across individuals in non-cancer tissues of three human population datasets, adipose tissue of the Multiple Tissue Human Expression Resource Projects (MuTHER), peripheral blood mononuclear cell (PBMC) from Asthma and normal control study participates, and lymphoblastoid cell lines (LCL) from healthy individuals. Three prediction models were investigated, single linear regression, multiple linear regression, and least absolute shrinkage and selection operator (LASSO) penalized regression. Our results showed that LASSO regression has superior performance among these methods. However, the prediction power is generally low and varies across datasets. Only 30 and 42 genes were found to have cross-validation R(2) greater than 0.3 in the PBMC and Adipose datasets, respectively. A substantially larger number of genes (258) were identified in the LCL dataset, which was generated from a more homogeneous cell line sample source. We also demonstrated that it gives better prediction power not to exclude any CpG probe due to cross hybridization or SNP effect. CONCLUSION: In our three population analyses DNA methylation of CpG sites at gene region have limited prediction power for gene expression across individuals with linear regression models. The prediction power potentially varies depending on tissue, cell type, and data sources. In our analyses, the combination of LASSO regression and all probes not excluding any probe on the methylation array provides the best prediction for gene expression.
format Online
Article
Text
id pubmed-6500370
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-65003702019-05-17 Predicting gene expression using DNA methylation in three human populations Zhong, Huan Kim, Soyeon Zhi, Degui Cui, Xiangqin PeerJ Bioinformatics BACKGROUND: DNA methylation, an important epigenetic mark, is well known for its regulatory role in gene expression, especially the negative correlation in the promoter region. However, its correlation with gene expression across genome at human population level has not been well studied. In particular, it is unclear if genome-wide DNA methylation profile of an individual can predict her/his gene expression profile. Previous studies were mostly limited to association analyses between single CpG site methylation and gene expression. It is not known whether DNA methylation of a gene has enough prediction power to serve as a surrogate for gene expression in existing human study cohorts with DNA samples other than RNA samples. RESULTS: We examined DNA methylation in the gene region for predicting gene expression across individuals in non-cancer tissues of three human population datasets, adipose tissue of the Multiple Tissue Human Expression Resource Projects (MuTHER), peripheral blood mononuclear cell (PBMC) from Asthma and normal control study participates, and lymphoblastoid cell lines (LCL) from healthy individuals. Three prediction models were investigated, single linear regression, multiple linear regression, and least absolute shrinkage and selection operator (LASSO) penalized regression. Our results showed that LASSO regression has superior performance among these methods. However, the prediction power is generally low and varies across datasets. Only 30 and 42 genes were found to have cross-validation R(2) greater than 0.3 in the PBMC and Adipose datasets, respectively. A substantially larger number of genes (258) were identified in the LCL dataset, which was generated from a more homogeneous cell line sample source. We also demonstrated that it gives better prediction power not to exclude any CpG probe due to cross hybridization or SNP effect. CONCLUSION: In our three population analyses DNA methylation of CpG sites at gene region have limited prediction power for gene expression across individuals with linear regression models. The prediction power potentially varies depending on tissue, cell type, and data sources. In our analyses, the combination of LASSO regression and all probes not excluding any probe on the methylation array provides the best prediction for gene expression. PeerJ Inc. 2019-05-01 /pmc/articles/PMC6500370/ /pubmed/31106051 http://dx.doi.org/10.7717/peerj.6757 Text en ©2019 Zhong et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Zhong, Huan
Kim, Soyeon
Zhi, Degui
Cui, Xiangqin
Predicting gene expression using DNA methylation in three human populations
title Predicting gene expression using DNA methylation in three human populations
title_full Predicting gene expression using DNA methylation in three human populations
title_fullStr Predicting gene expression using DNA methylation in three human populations
title_full_unstemmed Predicting gene expression using DNA methylation in three human populations
title_short Predicting gene expression using DNA methylation in three human populations
title_sort predicting gene expression using dna methylation in three human populations
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6500370/
https://www.ncbi.nlm.nih.gov/pubmed/31106051
http://dx.doi.org/10.7717/peerj.6757
work_keys_str_mv AT zhonghuan predictinggeneexpressionusingdnamethylationinthreehumanpopulations
AT kimsoyeon predictinggeneexpressionusingdnamethylationinthreehumanpopulations
AT zhidegui predictinggeneexpressionusingdnamethylationinthreehumanpopulations
AT cuixiangqin predictinggeneexpressionusingdnamethylationinthreehumanpopulations