Cargando…
Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts
High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it i...
Autores principales: | , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3077411/ https://www.ncbi.nlm.nih.gov/pubmed/21533142 http://dx.doi.org/10.1371/journal.pone.0018851 |
_version_ | 1782201882725318656 |
---|---|
author | Xu, Lijing Furlotte, Nicholas Lin, Yunyue Heinrich, Kevin Berry, Michael W. George, Ebenezer O. Homayouni, Ramin |
author_facet | Xu, Lijing Furlotte, Nicholas Lin, Yunyue Heinrich, Kevin Berry, Michael W. George, Ebenezer O. Homayouni, Ramin |
author_sort | Xu, Lijing |
collection | PubMed |
description | High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. AVAILABILITY: GCAT is freely available at http://binf1.memphis.edu/gcat |
format | Text |
id | pubmed-3077411 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-30774112011-04-29 Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts Xu, Lijing Furlotte, Nicholas Lin, Yunyue Heinrich, Kevin Berry, Michael W. George, Ebenezer O. Homayouni, Ramin PLoS One Research Article High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. AVAILABILITY: GCAT is freely available at http://binf1.memphis.edu/gcat Public Library of Science 2011-04-14 /pmc/articles/PMC3077411/ /pubmed/21533142 http://dx.doi.org/10.1371/journal.pone.0018851 Text en Xu et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Xu, Lijing Furlotte, Nicholas Lin, Yunyue Heinrich, Kevin Berry, Michael W. George, Ebenezer O. Homayouni, Ramin Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts |
title | Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts |
title_full | Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts |
title_fullStr | Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts |
title_full_unstemmed | Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts |
title_short | Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts |
title_sort | functional cohesion of gene sets determined by latent semantic indexing of pubmed abstracts |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3077411/ https://www.ncbi.nlm.nih.gov/pubmed/21533142 http://dx.doi.org/10.1371/journal.pone.0018851 |
work_keys_str_mv | AT xulijing functionalcohesionofgenesetsdeterminedbylatentsemanticindexingofpubmedabstracts AT furlottenicholas functionalcohesionofgenesetsdeterminedbylatentsemanticindexingofpubmedabstracts AT linyunyue functionalcohesionofgenesetsdeterminedbylatentsemanticindexingofpubmedabstracts AT heinrichkevin functionalcohesionofgenesetsdeterminedbylatentsemanticindexingofpubmedabstracts AT berrymichaelw functionalcohesionofgenesetsdeterminedbylatentsemanticindexingofpubmedabstracts AT georgeebenezero functionalcohesionofgenesetsdeterminedbylatentsemanticindexingofpubmedabstracts AT homayouniramin functionalcohesionofgenesetsdeterminedbylatentsemanticindexingofpubmedabstracts |