Cargando…

Conceptualization of molecular findings by mining gene annotations

BACKGROUND: The Gene Ontology (GO) is an ontology representing molecular biology concepts related to genes and their products. Current annotations from the GO Consortium tend to be highly specific, and contemporary genome-scale studies often return a long list of genes of potential interest, such as...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Vicky, Lu, Xinghua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4042834/
https://www.ncbi.nlm.nih.gov/pubmed/24564884
http://dx.doi.org/10.1186/1753-6561-7-S7-S2
_version_ 1782318863423111168
author Chen, Vicky
Lu, Xinghua
author_facet Chen, Vicky
Lu, Xinghua
author_sort Chen, Vicky
collection PubMed
description BACKGROUND: The Gene Ontology (GO) is an ontology representing molecular biology concepts related to genes and their products. Current annotations from the GO Consortium tend to be highly specific, and contemporary genome-scale studies often return a long list of genes of potential interest, such as genes in a cancer tumor that are differentially expressed than those found in normal tissue. It is therefore a challenging task to reveal, at a conceptual level, the major functional themes in which genes are involved. Presently, there is a need for tools capable of revealing such themes through mining and representing semantic information in an objective and quantitative manner. METHODS: In this study, we utilized the hierarchical organization of the GO to derive a more abstract representation of the major biological processes of a list of genes based on their annotations. We cast the task as follows: given a list of genes, identify non-disjoint, functionally coherent subsets, such that the functions of the genes in a subset are summarized by an informative GO term that accurately captures the semantic information of the original annotations. RESULTS: We evaluated different metrics for assessing information loss when merging GO terms, and different statistical schemes to assess the functional coherence of a set of genes. We found that the best discriminative power was achieved by using a combination of the information-content-based measure as the information-loss metric, and the graph-based statistics derived from a Steiner tree connecting genes in an augmented GO graph. CONCLUSIONS: Our methods provide an objective and quantitative approach to capturing the major directions of gene functions in a context-specific fashion.
format Online
Article
Text
id pubmed-4042834
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40428342014-06-17 Conceptualization of molecular findings by mining gene annotations Chen, Vicky Lu, Xinghua BMC Proc Proceedings BACKGROUND: The Gene Ontology (GO) is an ontology representing molecular biology concepts related to genes and their products. Current annotations from the GO Consortium tend to be highly specific, and contemporary genome-scale studies often return a long list of genes of potential interest, such as genes in a cancer tumor that are differentially expressed than those found in normal tissue. It is therefore a challenging task to reveal, at a conceptual level, the major functional themes in which genes are involved. Presently, there is a need for tools capable of revealing such themes through mining and representing semantic information in an objective and quantitative manner. METHODS: In this study, we utilized the hierarchical organization of the GO to derive a more abstract representation of the major biological processes of a list of genes based on their annotations. We cast the task as follows: given a list of genes, identify non-disjoint, functionally coherent subsets, such that the functions of the genes in a subset are summarized by an informative GO term that accurately captures the semantic information of the original annotations. RESULTS: We evaluated different metrics for assessing information loss when merging GO terms, and different statistical schemes to assess the functional coherence of a set of genes. We found that the best discriminative power was achieved by using a combination of the information-content-based measure as the information-loss metric, and the graph-based statistics derived from a Steiner tree connecting genes in an augmented GO graph. CONCLUSIONS: Our methods provide an objective and quantitative approach to capturing the major directions of gene functions in a context-specific fashion. BioMed Central 2013-12-20 /pmc/articles/PMC4042834/ /pubmed/24564884 http://dx.doi.org/10.1186/1753-6561-7-S7-S2 Text en Copyright © 2013 Chen and Lu; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Chen, Vicky
Lu, Xinghua
Conceptualization of molecular findings by mining gene annotations
title Conceptualization of molecular findings by mining gene annotations
title_full Conceptualization of molecular findings by mining gene annotations
title_fullStr Conceptualization of molecular findings by mining gene annotations
title_full_unstemmed Conceptualization of molecular findings by mining gene annotations
title_short Conceptualization of molecular findings by mining gene annotations
title_sort conceptualization of molecular findings by mining gene annotations
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4042834/
https://www.ncbi.nlm.nih.gov/pubmed/24564884
http://dx.doi.org/10.1186/1753-6561-7-S7-S2
work_keys_str_mv AT chenvicky conceptualizationofmolecularfindingsbymininggeneannotations
AT luxinghua conceptualizationofmolecularfindingsbymininggeneannotations