Cargando…

Computational algorithms to predict Gene Ontology annotations

BACKGROUND: Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pinoli, Pietro, Chicco, Davide, Masseroli, Marco
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4416163/ https://www.ncbi.nlm.nih.gov/pubmed/25916950 http://dx.doi.org/10.1186/1471-2105-16-S6-S4

_version_	1782369188089692160
author	Pinoli, Pietro Chicco, Davide Masseroli, Marco
author_facet	Pinoli, Pietro Chicco, Davide Masseroli, Marco
author_sort	Pinoli, Pietro
collection	PubMed
description	BACKGROUND: Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. METHODS: We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. RESULTS: We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. CONCLUSIONS: Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations.
format	Online Article Text
id	pubmed-4416163
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-44161632015-05-07 Computational algorithms to predict Gene Ontology annotations Pinoli, Pietro Chicco, Davide Masseroli, Marco BMC Bioinformatics Research BACKGROUND: Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. METHODS: We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. RESULTS: We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. CONCLUSIONS: Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations. BioMed Central 2015-04-17 /pmc/articles/PMC4416163/ /pubmed/25916950 http://dx.doi.org/10.1186/1471-2105-16-S6-S4 Text en Copyright © 2015 Pinoli et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Pinoli, Pietro Chicco, Davide Masseroli, Marco Computational algorithms to predict Gene Ontology annotations
title	Computational algorithms to predict Gene Ontology annotations
title_full	Computational algorithms to predict Gene Ontology annotations
title_fullStr	Computational algorithms to predict Gene Ontology annotations
title_full_unstemmed	Computational algorithms to predict Gene Ontology annotations
title_short	Computational algorithms to predict Gene Ontology annotations
title_sort	computational algorithms to predict gene ontology annotations
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4416163/ https://www.ncbi.nlm.nih.gov/pubmed/25916950 http://dx.doi.org/10.1186/1471-2105-16-S6-S4
work_keys_str_mv	AT pinolipietro computationalalgorithmstopredictgeneontologyannotations AT chiccodavide computationalalgorithmstopredictgeneontologyannotations AT masserolimarco computationalalgorithmstopredictgeneontologyannotations

Computational algorithms to predict Gene Ontology annotations

Ejemplares similares