Cargando…
Computational algorithms to predict Gene Ontology annotations
BACKGROUND: Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4416163/ https://www.ncbi.nlm.nih.gov/pubmed/25916950 http://dx.doi.org/10.1186/1471-2105-16-S6-S4 |
_version_ | 1782369188089692160 |
---|---|
author | Pinoli, Pietro Chicco, Davide Masseroli, Marco |
author_facet | Pinoli, Pietro Chicco, Davide Masseroli, Marco |
author_sort | Pinoli, Pietro |
collection | PubMed |
description | BACKGROUND: Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. METHODS: We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. RESULTS: We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. CONCLUSIONS: Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations. |
format | Online Article Text |
id | pubmed-4416163 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-44161632015-05-07 Computational algorithms to predict Gene Ontology annotations Pinoli, Pietro Chicco, Davide Masseroli, Marco BMC Bioinformatics Research BACKGROUND: Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. METHODS: We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. RESULTS: We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. CONCLUSIONS: Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations. BioMed Central 2015-04-17 /pmc/articles/PMC4416163/ /pubmed/25916950 http://dx.doi.org/10.1186/1471-2105-16-S6-S4 Text en Copyright © 2015 Pinoli et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Pinoli, Pietro Chicco, Davide Masseroli, Marco Computational algorithms to predict Gene Ontology annotations |
title | Computational algorithms to predict Gene Ontology annotations |
title_full | Computational algorithms to predict Gene Ontology annotations |
title_fullStr | Computational algorithms to predict Gene Ontology annotations |
title_full_unstemmed | Computational algorithms to predict Gene Ontology annotations |
title_short | Computational algorithms to predict Gene Ontology annotations |
title_sort | computational algorithms to predict gene ontology annotations |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4416163/ https://www.ncbi.nlm.nih.gov/pubmed/25916950 http://dx.doi.org/10.1186/1471-2105-16-S6-S4 |
work_keys_str_mv | AT pinolipietro computationalalgorithmstopredictgeneontologyannotations AT chiccodavide computationalalgorithmstopredictgeneontologyannotations AT masserolimarco computationalalgorithmstopredictgeneontologyannotations |