Cargando…
Conceptualization of molecular findings by mining gene annotations
BACKGROUND: The Gene Ontology (GO) is an ontology representing molecular biology concepts related to genes and their products. Current annotations from the GO Consortium tend to be highly specific, and contemporary genome-scale studies often return a long list of genes of potential interest, such as...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4042834/ https://www.ncbi.nlm.nih.gov/pubmed/24564884 http://dx.doi.org/10.1186/1753-6561-7-S7-S2 |
_version_ | 1782318863423111168 |
---|---|
author | Chen, Vicky Lu, Xinghua |
author_facet | Chen, Vicky Lu, Xinghua |
author_sort | Chen, Vicky |
collection | PubMed |
description | BACKGROUND: The Gene Ontology (GO) is an ontology representing molecular biology concepts related to genes and their products. Current annotations from the GO Consortium tend to be highly specific, and contemporary genome-scale studies often return a long list of genes of potential interest, such as genes in a cancer tumor that are differentially expressed than those found in normal tissue. It is therefore a challenging task to reveal, at a conceptual level, the major functional themes in which genes are involved. Presently, there is a need for tools capable of revealing such themes through mining and representing semantic information in an objective and quantitative manner. METHODS: In this study, we utilized the hierarchical organization of the GO to derive a more abstract representation of the major biological processes of a list of genes based on their annotations. We cast the task as follows: given a list of genes, identify non-disjoint, functionally coherent subsets, such that the functions of the genes in a subset are summarized by an informative GO term that accurately captures the semantic information of the original annotations. RESULTS: We evaluated different metrics for assessing information loss when merging GO terms, and different statistical schemes to assess the functional coherence of a set of genes. We found that the best discriminative power was achieved by using a combination of the information-content-based measure as the information-loss metric, and the graph-based statistics derived from a Steiner tree connecting genes in an augmented GO graph. CONCLUSIONS: Our methods provide an objective and quantitative approach to capturing the major directions of gene functions in a context-specific fashion. |
format | Online Article Text |
id | pubmed-4042834 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40428342014-06-17 Conceptualization of molecular findings by mining gene annotations Chen, Vicky Lu, Xinghua BMC Proc Proceedings BACKGROUND: The Gene Ontology (GO) is an ontology representing molecular biology concepts related to genes and their products. Current annotations from the GO Consortium tend to be highly specific, and contemporary genome-scale studies often return a long list of genes of potential interest, such as genes in a cancer tumor that are differentially expressed than those found in normal tissue. It is therefore a challenging task to reveal, at a conceptual level, the major functional themes in which genes are involved. Presently, there is a need for tools capable of revealing such themes through mining and representing semantic information in an objective and quantitative manner. METHODS: In this study, we utilized the hierarchical organization of the GO to derive a more abstract representation of the major biological processes of a list of genes based on their annotations. We cast the task as follows: given a list of genes, identify non-disjoint, functionally coherent subsets, such that the functions of the genes in a subset are summarized by an informative GO term that accurately captures the semantic information of the original annotations. RESULTS: We evaluated different metrics for assessing information loss when merging GO terms, and different statistical schemes to assess the functional coherence of a set of genes. We found that the best discriminative power was achieved by using a combination of the information-content-based measure as the information-loss metric, and the graph-based statistics derived from a Steiner tree connecting genes in an augmented GO graph. CONCLUSIONS: Our methods provide an objective and quantitative approach to capturing the major directions of gene functions in a context-specific fashion. BioMed Central 2013-12-20 /pmc/articles/PMC4042834/ /pubmed/24564884 http://dx.doi.org/10.1186/1753-6561-7-S7-S2 Text en Copyright © 2013 Chen and Lu; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Chen, Vicky Lu, Xinghua Conceptualization of molecular findings by mining gene annotations |
title | Conceptualization of molecular findings by mining gene annotations |
title_full | Conceptualization of molecular findings by mining gene annotations |
title_fullStr | Conceptualization of molecular findings by mining gene annotations |
title_full_unstemmed | Conceptualization of molecular findings by mining gene annotations |
title_short | Conceptualization of molecular findings by mining gene annotations |
title_sort | conceptualization of molecular findings by mining gene annotations |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4042834/ https://www.ncbi.nlm.nih.gov/pubmed/24564884 http://dx.doi.org/10.1186/1753-6561-7-S7-S2 |
work_keys_str_mv | AT chenvicky conceptualizationofmolecularfindingsbymininggeneannotations AT luxinghua conceptualizationofmolecularfindingsbymininggeneannotations |