Cargando…

GeneTopics - interpretation of gene sets via literature-driven topic models

BACKGROUND: Annotation of a set of genes is often accomplished through comparison to a library of labelled gene sets such as biological processes or canonical pathways. However, this approach might fail if the employed libraries are not up to date with the latest research, don't capture relevan...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Vicky, Xi, Li, Enayetallah, Ahmed, Fauman, Eric, Ziemek, Daniel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029197/ https://www.ncbi.nlm.nih.gov/pubmed/24564875 http://dx.doi.org/10.1186/1752-0509-7-S5-S10

_version_	1782317170954338304
author	Wang, Vicky Xi, Li Enayetallah, Ahmed Fauman, Eric Ziemek, Daniel
author_facet	Wang, Vicky Xi, Li Enayetallah, Ahmed Fauman, Eric Ziemek, Daniel
author_sort	Wang, Vicky
collection	PubMed
description	BACKGROUND: Annotation of a set of genes is often accomplished through comparison to a library of labelled gene sets such as biological processes or canonical pathways. However, this approach might fail if the employed libraries are not up to date with the latest research, don't capture relevant biological themes or are curated at a different level of granularity than is required to appropriately analyze the input gene set. At the same time, the vast biomedical literature offers an unstructured repository of the latest research findings that can be tapped to provide thematic sub-groupings for any input gene set. METHODS: Our proposed method relies on a gene-specific text corpus and extracts commonalities between documents in an unsupervised manner using a topic model approach. We automatically determine the number of topics summarizing the corpus and calculate a gene relevancy score for each topic allowing us to eliminate non-specific topics. As a result we obtain a set of literature topics in which each topic is associated with a subset of the input genes providing directly interpretable keywords and corresponding documents for literature research. RESULTS: We validate our method based on labelled gene sets from the KEGG metabolic pathway collection and the genetic association database (GAD) and show that the approach is able to detect topics consistent with the labelled annotation. Furthermore, we discuss the results on three different types of experimentally derived gene sets, (1) differentially expressed genes from a cardiac hypertrophy experiment in mice, (2) altered transcript abundance in human pancreatic beta cells, and (3) genes implicated by GWA studies to be associated with metabolite levels in a healthy population. In all three cases, we are able to replicate findings from the original papers in a quick and semi-automated manner. CONCLUSIONS: Our approach provides a novel way of automatically generating meaningful annotations for gene sets that are directly tied to relevant articles in the literature. Extending a general topic model method, the approach introduced here establishes a workflow for the interpretation of gene sets generated from diverse experimental scenarios that can complement the classical approach of comparison to reference gene sets.
format	Online Article Text
id	pubmed-4029197
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40291972014-06-19 GeneTopics - interpretation of gene sets via literature-driven topic models Wang, Vicky Xi, Li Enayetallah, Ahmed Fauman, Eric Ziemek, Daniel BMC Syst Biol Research BACKGROUND: Annotation of a set of genes is often accomplished through comparison to a library of labelled gene sets such as biological processes or canonical pathways. However, this approach might fail if the employed libraries are not up to date with the latest research, don't capture relevant biological themes or are curated at a different level of granularity than is required to appropriately analyze the input gene set. At the same time, the vast biomedical literature offers an unstructured repository of the latest research findings that can be tapped to provide thematic sub-groupings for any input gene set. METHODS: Our proposed method relies on a gene-specific text corpus and extracts commonalities between documents in an unsupervised manner using a topic model approach. We automatically determine the number of topics summarizing the corpus and calculate a gene relevancy score for each topic allowing us to eliminate non-specific topics. As a result we obtain a set of literature topics in which each topic is associated with a subset of the input genes providing directly interpretable keywords and corresponding documents for literature research. RESULTS: We validate our method based on labelled gene sets from the KEGG metabolic pathway collection and the genetic association database (GAD) and show that the approach is able to detect topics consistent with the labelled annotation. Furthermore, we discuss the results on three different types of experimentally derived gene sets, (1) differentially expressed genes from a cardiac hypertrophy experiment in mice, (2) altered transcript abundance in human pancreatic beta cells, and (3) genes implicated by GWA studies to be associated with metabolite levels in a healthy population. In all three cases, we are able to replicate findings from the original papers in a quick and semi-automated manner. CONCLUSIONS: Our approach provides a novel way of automatically generating meaningful annotations for gene sets that are directly tied to relevant articles in the literature. Extending a general topic model method, the approach introduced here establishes a workflow for the interpretation of gene sets generated from diverse experimental scenarios that can complement the classical approach of comparison to reference gene sets. BioMed Central 2013-12-09 /pmc/articles/PMC4029197/ /pubmed/24564875 http://dx.doi.org/10.1186/1752-0509-7-S5-S10 Text en Copyright © 2013 Wang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Wang, Vicky Xi, Li Enayetallah, Ahmed Fauman, Eric Ziemek, Daniel GeneTopics - interpretation of gene sets via literature-driven topic models
title	GeneTopics - interpretation of gene sets via literature-driven topic models
title_full	GeneTopics - interpretation of gene sets via literature-driven topic models
title_fullStr	GeneTopics - interpretation of gene sets via literature-driven topic models
title_full_unstemmed	GeneTopics - interpretation of gene sets via literature-driven topic models
title_short	GeneTopics - interpretation of gene sets via literature-driven topic models
title_sort	genetopics - interpretation of gene sets via literature-driven topic models
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029197/ https://www.ncbi.nlm.nih.gov/pubmed/24564875 http://dx.doi.org/10.1186/1752-0509-7-S5-S10
work_keys_str_mv	AT wangvicky genetopicsinterpretationofgenesetsvialiteraturedriventopicmodels AT xili genetopicsinterpretationofgenesetsvialiteraturedriventopicmodels AT enayetallahahmed genetopicsinterpretationofgenesetsvialiteraturedriventopicmodels AT faumaneric genetopicsinterpretationofgenesetsvialiteraturedriventopicmodels AT ziemekdaniel genetopicsinterpretationofgenesetsvialiteraturedriventopicmodels

GeneTopics - interpretation of gene sets via literature-driven topic models

Ejemplares similares