Cargando…
Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia
Gene expression is a combinatorial function of genetic/epigenetic factors such as copy number variation (CNV), DNA methylation (DM), transcription factors (TF) occupancy, and microRNA (miRNA) post-transcriptional regulation. At the maturity of microarray/sequencing technologies, large amounts of dat...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4207489/ https://www.ncbi.nlm.nih.gov/pubmed/25340776 http://dx.doi.org/10.1371/journal.pcbi.1003908 |
_version_ | 1782340974250295296 |
---|---|
author | Li, Yue Liang, Minggao Zhang, Zhaolei |
author_facet | Li, Yue Liang, Minggao Zhang, Zhaolei |
author_sort | Li, Yue |
collection | PubMed |
description | Gene expression is a combinatorial function of genetic/epigenetic factors such as copy number variation (CNV), DNA methylation (DM), transcription factors (TF) occupancy, and microRNA (miRNA) post-transcriptional regulation. At the maturity of microarray/sequencing technologies, large amounts of data measuring the genome-wide signals of those factors became available from Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA). However, there is a lack of an integrative model to take full advantage of these rich yet heterogeneous data. To this end, we developed RACER (Regression Analysis of Combined Expression Regulation), which fits the mRNA expression as response using as explanatory variables, the TF data from ENCODE, and CNV, DM, miRNA expression signals from TCGA. Briefly, RACER first infers the sample-specific regulatory activities by TFs and miRNAs, which are then used as inputs to infer specific TF/miRNA-gene interactions. Such a two-stage regression framework circumvents a common difficulty in integrating ENCODE data measured in generic cell-line with the sample-specific TCGA measurements. As a case study, we integrated Acute Myeloid Leukemia (AML) data from TCGA and the related TF binding data measured in K562 from ENCODE. As a proof-of-concept, we first verified our model formalism by 10-fold cross-validation on predicting gene expression. We next evaluated RACER on recovering known regulatory interactions, and demonstrated its superior statistical power over existing methods in detecting known miRNA/TF targets. Additionally, we developed a feature selection procedure, which identified 18 regulators, whose activities clustered consistently with cytogenetic risk groups. One of the selected regulators is miR-548p, whose inferred targets were significantly enriched for leukemia-related pathway, implicating its novel role in AML pathogenesis. Moreover, survival analysis using the inferred activities identified C-Fos as a potential AML prognostic marker. Together, we provided a novel framework that successfully integrated the TCGA and ENCODE data in revealing AML-specific regulatory program at global level. |
format | Online Article Text |
id | pubmed-4207489 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-42074892014-10-27 Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia Li, Yue Liang, Minggao Zhang, Zhaolei PLoS Comput Biol Research Article Gene expression is a combinatorial function of genetic/epigenetic factors such as copy number variation (CNV), DNA methylation (DM), transcription factors (TF) occupancy, and microRNA (miRNA) post-transcriptional regulation. At the maturity of microarray/sequencing technologies, large amounts of data measuring the genome-wide signals of those factors became available from Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA). However, there is a lack of an integrative model to take full advantage of these rich yet heterogeneous data. To this end, we developed RACER (Regression Analysis of Combined Expression Regulation), which fits the mRNA expression as response using as explanatory variables, the TF data from ENCODE, and CNV, DM, miRNA expression signals from TCGA. Briefly, RACER first infers the sample-specific regulatory activities by TFs and miRNAs, which are then used as inputs to infer specific TF/miRNA-gene interactions. Such a two-stage regression framework circumvents a common difficulty in integrating ENCODE data measured in generic cell-line with the sample-specific TCGA measurements. As a case study, we integrated Acute Myeloid Leukemia (AML) data from TCGA and the related TF binding data measured in K562 from ENCODE. As a proof-of-concept, we first verified our model formalism by 10-fold cross-validation on predicting gene expression. We next evaluated RACER on recovering known regulatory interactions, and demonstrated its superior statistical power over existing methods in detecting known miRNA/TF targets. Additionally, we developed a feature selection procedure, which identified 18 regulators, whose activities clustered consistently with cytogenetic risk groups. One of the selected regulators is miR-548p, whose inferred targets were significantly enriched for leukemia-related pathway, implicating its novel role in AML pathogenesis. Moreover, survival analysis using the inferred activities identified C-Fos as a potential AML prognostic marker. Together, we provided a novel framework that successfully integrated the TCGA and ENCODE data in revealing AML-specific regulatory program at global level. Public Library of Science 2014-10-23 /pmc/articles/PMC4207489/ /pubmed/25340776 http://dx.doi.org/10.1371/journal.pcbi.1003908 Text en © 2014 Li et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Li, Yue Liang, Minggao Zhang, Zhaolei Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia |
title | Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia |
title_full | Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia |
title_fullStr | Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia |
title_full_unstemmed | Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia |
title_short | Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia |
title_sort | regression analysis of combined gene expression regulation in acute myeloid leukemia |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4207489/ https://www.ncbi.nlm.nih.gov/pubmed/25340776 http://dx.doi.org/10.1371/journal.pcbi.1003908 |
work_keys_str_mv | AT liyue regressionanalysisofcombinedgeneexpressionregulationinacutemyeloidleukemia AT liangminggao regressionanalysisofcombinedgeneexpressionregulationinacutemyeloidleukemia AT zhangzhaolei regressionanalysisofcombinedgeneexpressionregulationinacutemyeloidleukemia |