Cargando…

Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts

High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it i...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Lijing, Furlotte, Nicholas, Lin, Yunyue, Heinrich, Kevin, Berry, Michael W., George, Ebenezer O., Homayouni, Ramin
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3077411/
https://www.ncbi.nlm.nih.gov/pubmed/21533142
http://dx.doi.org/10.1371/journal.pone.0018851
_version_ 1782201882725318656
author Xu, Lijing
Furlotte, Nicholas
Lin, Yunyue
Heinrich, Kevin
Berry, Michael W.
George, Ebenezer O.
Homayouni, Ramin
author_facet Xu, Lijing
Furlotte, Nicholas
Lin, Yunyue
Heinrich, Kevin
Berry, Michael W.
George, Ebenezer O.
Homayouni, Ramin
author_sort Xu, Lijing
collection PubMed
description High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. AVAILABILITY: GCAT is freely available at http://binf1.memphis.edu/gcat
format Text
id pubmed-3077411
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-30774112011-04-29 Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts Xu, Lijing Furlotte, Nicholas Lin, Yunyue Heinrich, Kevin Berry, Michael W. George, Ebenezer O. Homayouni, Ramin PLoS One Research Article High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. AVAILABILITY: GCAT is freely available at http://binf1.memphis.edu/gcat Public Library of Science 2011-04-14 /pmc/articles/PMC3077411/ /pubmed/21533142 http://dx.doi.org/10.1371/journal.pone.0018851 Text en Xu et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Xu, Lijing
Furlotte, Nicholas
Lin, Yunyue
Heinrich, Kevin
Berry, Michael W.
George, Ebenezer O.
Homayouni, Ramin
Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts
title Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts
title_full Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts
title_fullStr Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts
title_full_unstemmed Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts
title_short Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts
title_sort functional cohesion of gene sets determined by latent semantic indexing of pubmed abstracts
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3077411/
https://www.ncbi.nlm.nih.gov/pubmed/21533142
http://dx.doi.org/10.1371/journal.pone.0018851
work_keys_str_mv AT xulijing functionalcohesionofgenesetsdeterminedbylatentsemanticindexingofpubmedabstracts
AT furlottenicholas functionalcohesionofgenesetsdeterminedbylatentsemanticindexingofpubmedabstracts
AT linyunyue functionalcohesionofgenesetsdeterminedbylatentsemanticindexingofpubmedabstracts
AT heinrichkevin functionalcohesionofgenesetsdeterminedbylatentsemanticindexingofpubmedabstracts
AT berrymichaelw functionalcohesionofgenesetsdeterminedbylatentsemanticindexingofpubmedabstracts
AT georgeebenezero functionalcohesionofgenesetsdeterminedbylatentsemanticindexingofpubmedabstracts
AT homayouniramin functionalcohesionofgenesetsdeterminedbylatentsemanticindexingofpubmedabstracts