Cargando…
Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease
Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual’s diseas...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4332045/ https://www.ncbi.nlm.nih.gov/pubmed/25733795 http://dx.doi.org/10.4137/CIN.S16354 |
_version_ | 1782357846667558912 |
---|---|
author | Ye, Shuyun Dawson, John A Kendziorski, Christina |
author_facet | Ye, Shuyun Dawson, John A Kendziorski, Christina |
author_sort | Ye, Shuyun |
collection | PubMed |
description | Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual’s disease course and likelihood of response to treatment. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework to address these challenges. Latent Dirichlet allocation (LDA) models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a “document” with “text” detailing his/her clinical events and genomic state. We then further extend the framework to allow for supervision by a time-to-event response. The model enables the efficient identification of collections of clinical and genomic features that co-occur within patient subgroups, and then characterizes each patient by those features. An application of survLDA to The Cancer Genome Atlas ovarian project identifies informative patient subgroups showing differential response to treatment, and validation in an independent cohort demonstrates the potential for patient-specific inference. |
format | Online Article Text |
id | pubmed-4332045 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-43320452015-03-02 Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease Ye, Shuyun Dawson, John A Kendziorski, Christina Cancer Inform Original Research Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual’s disease course and likelihood of response to treatment. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework to address these challenges. Latent Dirichlet allocation (LDA) models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a “document” with “text” detailing his/her clinical events and genomic state. We then further extend the framework to allow for supervision by a time-to-event response. The model enables the efficient identification of collections of clinical and genomic features that co-occur within patient subgroups, and then characterizes each patient by those features. An application of survLDA to The Cancer Genome Atlas ovarian project identifies informative patient subgroups showing differential response to treatment, and validation in an independent cohort demonstrates the potential for patient-specific inference. Libertas Academica 2015-02-10 /pmc/articles/PMC4332045/ /pubmed/25733795 http://dx.doi.org/10.4137/CIN.S16354 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License. |
spellingShingle | Original Research Ye, Shuyun Dawson, John A Kendziorski, Christina Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease |
title | Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease |
title_full | Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease |
title_fullStr | Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease |
title_full_unstemmed | Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease |
title_short | Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease |
title_sort | extending information retrieval methods to personalized genomic-based studies of disease |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4332045/ https://www.ncbi.nlm.nih.gov/pubmed/25733795 http://dx.doi.org/10.4137/CIN.S16354 |
work_keys_str_mv | AT yeshuyun extendinginformationretrievalmethodstopersonalizedgenomicbasedstudiesofdisease AT dawsonjohna extendinginformationretrievalmethodstopersonalizedgenomicbasedstudiesofdisease AT kendziorskichristina extendinginformationretrievalmethodstopersonalizedgenomicbasedstudiesofdisease |