Cargando…

Computational algorithms to predict Gene Ontology annotations

BACKGROUND: Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to...

Descripción completa

Detalles Bibliográficos
Autores principales: Pinoli, Pietro, Chicco, Davide, Masseroli, Marco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4416163/
https://www.ncbi.nlm.nih.gov/pubmed/25916950
http://dx.doi.org/10.1186/1471-2105-16-S6-S4
_version_ 1782369188089692160
author Pinoli, Pietro
Chicco, Davide
Masseroli, Marco
author_facet Pinoli, Pietro
Chicco, Davide
Masseroli, Marco
author_sort Pinoli, Pietro
collection PubMed
description BACKGROUND: Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. METHODS: We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. RESULTS: We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. CONCLUSIONS: Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations.
format Online
Article
Text
id pubmed-4416163
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44161632015-05-07 Computational algorithms to predict Gene Ontology annotations Pinoli, Pietro Chicco, Davide Masseroli, Marco BMC Bioinformatics Research BACKGROUND: Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. METHODS: We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. RESULTS: We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. CONCLUSIONS: Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations. BioMed Central 2015-04-17 /pmc/articles/PMC4416163/ /pubmed/25916950 http://dx.doi.org/10.1186/1471-2105-16-S6-S4 Text en Copyright © 2015 Pinoli et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Pinoli, Pietro
Chicco, Davide
Masseroli, Marco
Computational algorithms to predict Gene Ontology annotations
title Computational algorithms to predict Gene Ontology annotations
title_full Computational algorithms to predict Gene Ontology annotations
title_fullStr Computational algorithms to predict Gene Ontology annotations
title_full_unstemmed Computational algorithms to predict Gene Ontology annotations
title_short Computational algorithms to predict Gene Ontology annotations
title_sort computational algorithms to predict gene ontology annotations
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4416163/
https://www.ncbi.nlm.nih.gov/pubmed/25916950
http://dx.doi.org/10.1186/1471-2105-16-S6-S4
work_keys_str_mv AT pinolipietro computationalalgorithmstopredictgeneontologyannotations
AT chiccodavide computationalalgorithmstopredictgeneontologyannotations
AT masserolimarco computationalalgorithmstopredictgeneontologyannotations