Cargando…

Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia

Gene expression is a combinatorial function of genetic/epigenetic factors such as copy number variation (CNV), DNA methylation (DM), transcription factors (TF) occupancy, and microRNA (miRNA) post-transcriptional regulation. At the maturity of microarray/sequencing technologies, large amounts of dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yue, Liang, Minggao, Zhang, Zhaolei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4207489/
https://www.ncbi.nlm.nih.gov/pubmed/25340776
http://dx.doi.org/10.1371/journal.pcbi.1003908
_version_ 1782340974250295296
author Li, Yue
Liang, Minggao
Zhang, Zhaolei
author_facet Li, Yue
Liang, Minggao
Zhang, Zhaolei
author_sort Li, Yue
collection PubMed
description Gene expression is a combinatorial function of genetic/epigenetic factors such as copy number variation (CNV), DNA methylation (DM), transcription factors (TF) occupancy, and microRNA (miRNA) post-transcriptional regulation. At the maturity of microarray/sequencing technologies, large amounts of data measuring the genome-wide signals of those factors became available from Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA). However, there is a lack of an integrative model to take full advantage of these rich yet heterogeneous data. To this end, we developed RACER (Regression Analysis of Combined Expression Regulation), which fits the mRNA expression as response using as explanatory variables, the TF data from ENCODE, and CNV, DM, miRNA expression signals from TCGA. Briefly, RACER first infers the sample-specific regulatory activities by TFs and miRNAs, which are then used as inputs to infer specific TF/miRNA-gene interactions. Such a two-stage regression framework circumvents a common difficulty in integrating ENCODE data measured in generic cell-line with the sample-specific TCGA measurements. As a case study, we integrated Acute Myeloid Leukemia (AML) data from TCGA and the related TF binding data measured in K562 from ENCODE. As a proof-of-concept, we first verified our model formalism by 10-fold cross-validation on predicting gene expression. We next evaluated RACER on recovering known regulatory interactions, and demonstrated its superior statistical power over existing methods in detecting known miRNA/TF targets. Additionally, we developed a feature selection procedure, which identified 18 regulators, whose activities clustered consistently with cytogenetic risk groups. One of the selected regulators is miR-548p, whose inferred targets were significantly enriched for leukemia-related pathway, implicating its novel role in AML pathogenesis. Moreover, survival analysis using the inferred activities identified C-Fos as a potential AML prognostic marker. Together, we provided a novel framework that successfully integrated the TCGA and ENCODE data in revealing AML-specific regulatory program at global level.
format Online
Article
Text
id pubmed-4207489
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-42074892014-10-27 Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia Li, Yue Liang, Minggao Zhang, Zhaolei PLoS Comput Biol Research Article Gene expression is a combinatorial function of genetic/epigenetic factors such as copy number variation (CNV), DNA methylation (DM), transcription factors (TF) occupancy, and microRNA (miRNA) post-transcriptional regulation. At the maturity of microarray/sequencing technologies, large amounts of data measuring the genome-wide signals of those factors became available from Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA). However, there is a lack of an integrative model to take full advantage of these rich yet heterogeneous data. To this end, we developed RACER (Regression Analysis of Combined Expression Regulation), which fits the mRNA expression as response using as explanatory variables, the TF data from ENCODE, and CNV, DM, miRNA expression signals from TCGA. Briefly, RACER first infers the sample-specific regulatory activities by TFs and miRNAs, which are then used as inputs to infer specific TF/miRNA-gene interactions. Such a two-stage regression framework circumvents a common difficulty in integrating ENCODE data measured in generic cell-line with the sample-specific TCGA measurements. As a case study, we integrated Acute Myeloid Leukemia (AML) data from TCGA and the related TF binding data measured in K562 from ENCODE. As a proof-of-concept, we first verified our model formalism by 10-fold cross-validation on predicting gene expression. We next evaluated RACER on recovering known regulatory interactions, and demonstrated its superior statistical power over existing methods in detecting known miRNA/TF targets. Additionally, we developed a feature selection procedure, which identified 18 regulators, whose activities clustered consistently with cytogenetic risk groups. One of the selected regulators is miR-548p, whose inferred targets were significantly enriched for leukemia-related pathway, implicating its novel role in AML pathogenesis. Moreover, survival analysis using the inferred activities identified C-Fos as a potential AML prognostic marker. Together, we provided a novel framework that successfully integrated the TCGA and ENCODE data in revealing AML-specific regulatory program at global level. Public Library of Science 2014-10-23 /pmc/articles/PMC4207489/ /pubmed/25340776 http://dx.doi.org/10.1371/journal.pcbi.1003908 Text en © 2014 Li et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Li, Yue
Liang, Minggao
Zhang, Zhaolei
Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia
title Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia
title_full Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia
title_fullStr Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia
title_full_unstemmed Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia
title_short Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia
title_sort regression analysis of combined gene expression regulation in acute myeloid leukemia
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4207489/
https://www.ncbi.nlm.nih.gov/pubmed/25340776
http://dx.doi.org/10.1371/journal.pcbi.1003908
work_keys_str_mv AT liyue regressionanalysisofcombinedgeneexpressionregulationinacutemyeloidleukemia
AT liangminggao regressionanalysisofcombinedgeneexpressionregulationinacutemyeloidleukemia
AT zhangzhaolei regressionanalysisofcombinedgeneexpressionregulationinacutemyeloidleukemia