Cargando…

Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model

BACKGROUND: Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotatio...

Descripción completa

Detalles Bibliográficos
Autores principales:	He, Xin, Sarma, Moushumi Sen, Ling, Xu, Chee, Brant, Zhai, Chengxiang, Schatz, Bruce
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Research article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2885378/ https://www.ncbi.nlm.nih.gov/pubmed/20487560 http://dx.doi.org/10.1186/1471-2105-11-272

_version_	1782182382142488576
author	He, Xin Sarma, Moushumi Sen Ling, Xu Chee, Brant Zhai, Chengxiang Schatz, Bruce
author_facet	He, Xin Sarma, Moushumi Sen Ling, Xu Chee, Brant Zhai, Chengxiang Schatz, Bruce
author_sort	He, Xin
collection	PubMed
description	BACKGROUND: Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on controlled vocabularies, in particular, Gene Ontology (GO). However, the annotation of genes is a labor-intensive process; and the vocabularies are generally incomplete, leaving some important biological domains inadequately covered. RESULTS: We propose a statistical method that uses the primary literature, i.e. free-text, as the source to perform overrepresentation analysis. The method is based on a statistical framework of mixture model and addresses the methodological flaws in several existing programs. We implemented this method within a literature mining system, BeeSpace, taking advantage of its analysis environment and added features that facilitate the interactive analysis of gene sets. Through experimentation with several datasets, we showed that our program can effectively summarize the important conceptual themes of large gene sets, even when traditional GO-based analysis does not yield informative results. CONCLUSIONS: We conclude that the current work will provide biologists with a tool that effectively complements the existing ones for overrepresentation analysis from genomic experiments. Our program, Genelist Analyzer, is freely available at: http://workerbee.igb.uiuc.edu:8080/BeeSpace/Search.jsp
format	Text
id	pubmed-2885378
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-28853782010-06-15 Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model He, Xin Sarma, Moushumi Sen Ling, Xu Chee, Brant Zhai, Chengxiang Schatz, Bruce BMC Bioinformatics Research article BACKGROUND: Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on controlled vocabularies, in particular, Gene Ontology (GO). However, the annotation of genes is a labor-intensive process; and the vocabularies are generally incomplete, leaving some important biological domains inadequately covered. RESULTS: We propose a statistical method that uses the primary literature, i.e. free-text, as the source to perform overrepresentation analysis. The method is based on a statistical framework of mixture model and addresses the methodological flaws in several existing programs. We implemented this method within a literature mining system, BeeSpace, taking advantage of its analysis environment and added features that facilitate the interactive analysis of gene sets. Through experimentation with several datasets, we showed that our program can effectively summarize the important conceptual themes of large gene sets, even when traditional GO-based analysis does not yield informative results. CONCLUSIONS: We conclude that the current work will provide biologists with a tool that effectively complements the existing ones for overrepresentation analysis from genomic experiments. Our program, Genelist Analyzer, is freely available at: http://workerbee.igb.uiuc.edu:8080/BeeSpace/Search.jsp BioMed Central 2010-05-20 /pmc/articles/PMC2885378/ /pubmed/20487560 http://dx.doi.org/10.1186/1471-2105-11-272 Text en Copyright ©2010 He et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research article He, Xin Sarma, Moushumi Sen Ling, Xu Chee, Brant Zhai, Chengxiang Schatz, Bruce Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model
title	Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model
title_full	Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model
title_fullStr	Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model
title_full_unstemmed	Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model
title_short	Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model
title_sort	identifying overrepresented concepts in gene lists from literature: a statistical approach based on poisson mixture model
topic	Research article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2885378/ https://www.ncbi.nlm.nih.gov/pubmed/20487560 http://dx.doi.org/10.1186/1471-2105-11-272
work_keys_str_mv	AT hexin identifyingoverrepresentedconceptsingenelistsfromliteratureastatisticalapproachbasedonpoissonmixturemodel AT sarmamoushumisen identifyingoverrepresentedconceptsingenelistsfromliteratureastatisticalapproachbasedonpoissonmixturemodel AT lingxu identifyingoverrepresentedconceptsingenelistsfromliteratureastatisticalapproachbasedonpoissonmixturemodel AT cheebrant identifyingoverrepresentedconceptsingenelistsfromliteratureastatisticalapproachbasedonpoissonmixturemodel AT zhaichengxiang identifyingoverrepresentedconceptsingenelistsfromliteratureastatisticalapproachbasedonpoissonmixturemodel AT schatzbruce identifyingoverrepresentedconceptsingenelistsfromliteratureastatisticalapproachbasedonpoissonmixturemodel

Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model

Ejemplares similares