Cargando…

A Predictive Framework for Integrating Disparate Genomic Data Types Using Sample-Specific Gene Set Enrichment Analysis and Multi-Task Learning

Understanding the root molecular and genetic causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is n...

Descripción completa

Detalles Bibliográficos
Autores principales: Bennett, Brian D., Xiong, Qing, Mukherjee, Sayan, Furey, Terrence S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441565/
https://www.ncbi.nlm.nih.gov/pubmed/23028573
http://dx.doi.org/10.1371/journal.pone.0044635
_version_ 1782243320129388544
author Bennett, Brian D.
Xiong, Qing
Mukherjee, Sayan
Furey, Terrence S.
author_facet Bennett, Brian D.
Xiong, Qing
Mukherjee, Sayan
Furey, Terrence S.
author_sort Bennett, Brian D.
collection PubMed
description Understanding the root molecular and genetic causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is not well understood. In this study, we developed a framework to integrate gene expression and genotype data to identify biological differences between samples from opposing complex trait classes that are driven by expression changes and genotypic variation. This framework utilizes pathway analysis and multi-task learning to build a predictive model and discover pathways relevant to the complex trait of interest. We simulated expression and genotype data to test the predictive ability of our framework and to measure how well it uncovered pathways with genes both differentially expressed and genetically associated with a complex trait. We found that the predictive performance of the multi-task model was comparable to other similar methods. Also, methods like multi-task learning that considered enrichment analysis scores from both data sets found pathways with both genetic and expression differences related to the phenotype. We used our framework to analyze differences between estrogen receptor (ER) positive and negative breast cancer samples. An analysis of the top 15 gene sets from the multi-task model showed they were all related to estrogen, steroids, cell signaling, or the cell cycle. Although our study suggests that multi-task learning does not enhance predictive accuracy, the models generated by our framework do provide valuable biological pathway knowledge for complex traits.
format Online
Article
Text
id pubmed-3441565
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34415652012-10-01 A Predictive Framework for Integrating Disparate Genomic Data Types Using Sample-Specific Gene Set Enrichment Analysis and Multi-Task Learning Bennett, Brian D. Xiong, Qing Mukherjee, Sayan Furey, Terrence S. PLoS One Research Article Understanding the root molecular and genetic causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is not well understood. In this study, we developed a framework to integrate gene expression and genotype data to identify biological differences between samples from opposing complex trait classes that are driven by expression changes and genotypic variation. This framework utilizes pathway analysis and multi-task learning to build a predictive model and discover pathways relevant to the complex trait of interest. We simulated expression and genotype data to test the predictive ability of our framework and to measure how well it uncovered pathways with genes both differentially expressed and genetically associated with a complex trait. We found that the predictive performance of the multi-task model was comparable to other similar methods. Also, methods like multi-task learning that considered enrichment analysis scores from both data sets found pathways with both genetic and expression differences related to the phenotype. We used our framework to analyze differences between estrogen receptor (ER) positive and negative breast cancer samples. An analysis of the top 15 gene sets from the multi-task model showed they were all related to estrogen, steroids, cell signaling, or the cell cycle. Although our study suggests that multi-task learning does not enhance predictive accuracy, the models generated by our framework do provide valuable biological pathway knowledge for complex traits. Public Library of Science 2012-09-13 /pmc/articles/PMC3441565/ /pubmed/23028573 http://dx.doi.org/10.1371/journal.pone.0044635 Text en © 2012 Bennett et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Bennett, Brian D.
Xiong, Qing
Mukherjee, Sayan
Furey, Terrence S.
A Predictive Framework for Integrating Disparate Genomic Data Types Using Sample-Specific Gene Set Enrichment Analysis and Multi-Task Learning
title A Predictive Framework for Integrating Disparate Genomic Data Types Using Sample-Specific Gene Set Enrichment Analysis and Multi-Task Learning
title_full A Predictive Framework for Integrating Disparate Genomic Data Types Using Sample-Specific Gene Set Enrichment Analysis and Multi-Task Learning
title_fullStr A Predictive Framework for Integrating Disparate Genomic Data Types Using Sample-Specific Gene Set Enrichment Analysis and Multi-Task Learning
title_full_unstemmed A Predictive Framework for Integrating Disparate Genomic Data Types Using Sample-Specific Gene Set Enrichment Analysis and Multi-Task Learning
title_short A Predictive Framework for Integrating Disparate Genomic Data Types Using Sample-Specific Gene Set Enrichment Analysis and Multi-Task Learning
title_sort predictive framework for integrating disparate genomic data types using sample-specific gene set enrichment analysis and multi-task learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441565/
https://www.ncbi.nlm.nih.gov/pubmed/23028573
http://dx.doi.org/10.1371/journal.pone.0044635
work_keys_str_mv AT bennettbriand apredictiveframeworkforintegratingdisparategenomicdatatypesusingsamplespecificgenesetenrichmentanalysisandmultitasklearning
AT xiongqing apredictiveframeworkforintegratingdisparategenomicdatatypesusingsamplespecificgenesetenrichmentanalysisandmultitasklearning
AT mukherjeesayan apredictiveframeworkforintegratingdisparategenomicdatatypesusingsamplespecificgenesetenrichmentanalysisandmultitasklearning
AT fureyterrences apredictiveframeworkforintegratingdisparategenomicdatatypesusingsamplespecificgenesetenrichmentanalysisandmultitasklearning
AT bennettbriand predictiveframeworkforintegratingdisparategenomicdatatypesusingsamplespecificgenesetenrichmentanalysisandmultitasklearning
AT xiongqing predictiveframeworkforintegratingdisparategenomicdatatypesusingsamplespecificgenesetenrichmentanalysisandmultitasklearning
AT mukherjeesayan predictiveframeworkforintegratingdisparategenomicdatatypesusingsamplespecificgenesetenrichmentanalysisandmultitasklearning
AT fureyterrences predictiveframeworkforintegratingdisparategenomicdatatypesusingsamplespecificgenesetenrichmentanalysisandmultitasklearning