Cargando…

Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease

Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual’s diseas...

Descripción completa

Detalles Bibliográficos
Autores principales: Ye, Shuyun, Dawson, John A, Kendziorski, Christina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4332045/
https://www.ncbi.nlm.nih.gov/pubmed/25733795
http://dx.doi.org/10.4137/CIN.S16354
_version_ 1782357846667558912
author Ye, Shuyun
Dawson, John A
Kendziorski, Christina
author_facet Ye, Shuyun
Dawson, John A
Kendziorski, Christina
author_sort Ye, Shuyun
collection PubMed
description Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual’s disease course and likelihood of response to treatment. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework to address these challenges. Latent Dirichlet allocation (LDA) models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a “document” with “text” detailing his/her clinical events and genomic state. We then further extend the framework to allow for supervision by a time-to-event response. The model enables the efficient identification of collections of clinical and genomic features that co-occur within patient subgroups, and then characterizes each patient by those features. An application of survLDA to The Cancer Genome Atlas ovarian project identifies informative patient subgroups showing differential response to treatment, and validation in an independent cohort demonstrates the potential for patient-specific inference.
format Online
Article
Text
id pubmed-4332045
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-43320452015-03-02 Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease Ye, Shuyun Dawson, John A Kendziorski, Christina Cancer Inform Original Research Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual’s disease course and likelihood of response to treatment. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework to address these challenges. Latent Dirichlet allocation (LDA) models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a “document” with “text” detailing his/her clinical events and genomic state. We then further extend the framework to allow for supervision by a time-to-event response. The model enables the efficient identification of collections of clinical and genomic features that co-occur within patient subgroups, and then characterizes each patient by those features. An application of survLDA to The Cancer Genome Atlas ovarian project identifies informative patient subgroups showing differential response to treatment, and validation in an independent cohort demonstrates the potential for patient-specific inference. Libertas Academica 2015-02-10 /pmc/articles/PMC4332045/ /pubmed/25733795 http://dx.doi.org/10.4137/CIN.S16354 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.
spellingShingle Original Research
Ye, Shuyun
Dawson, John A
Kendziorski, Christina
Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease
title Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease
title_full Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease
title_fullStr Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease
title_full_unstemmed Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease
title_short Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease
title_sort extending information retrieval methods to personalized genomic-based studies of disease
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4332045/
https://www.ncbi.nlm.nih.gov/pubmed/25733795
http://dx.doi.org/10.4137/CIN.S16354
work_keys_str_mv AT yeshuyun extendinginformationretrievalmethodstopersonalizedgenomicbasedstudiesofdisease
AT dawsonjohna extendinginformationretrievalmethodstopersonalizedgenomicbasedstudiesofdisease
AT kendziorskichristina extendinginformationretrievalmethodstopersonalizedgenomicbasedstudiesofdisease