Cargando…

Rapid Annotation of Anonymous Sequences from Genome Projects Using Semantic Similarities and a Weighting Scheme in Gene Ontology

BACKGROUND: Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has...

Descripción completa

Detalles Bibliográficos
Autores principales: Fontana, Paolo, Cestaro, Alessandro, Velasco, Riccardo, Formentin, Elide, Toppo, Stefano
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2645684/
https://www.ncbi.nlm.nih.gov/pubmed/19247487
http://dx.doi.org/10.1371/journal.pone.0004619
_version_ 1782164796071739392
author Fontana, Paolo
Cestaro, Alessandro
Velasco, Riccardo
Formentin, Elide
Toppo, Stefano
author_facet Fontana, Paolo
Cestaro, Alessandro
Velasco, Riccardo
Formentin, Elide
Toppo, Stefano
author_sort Fontana, Paolo
collection PubMed
description BACKGROUND: Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task. METHODOLOGY: We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms) that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results. CONCLUSIONS: The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.
format Text
id pubmed-2645684
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-26456842009-02-27 Rapid Annotation of Anonymous Sequences from Genome Projects Using Semantic Similarities and a Weighting Scheme in Gene Ontology Fontana, Paolo Cestaro, Alessandro Velasco, Riccardo Formentin, Elide Toppo, Stefano PLoS One Research Article BACKGROUND: Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task. METHODOLOGY: We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms) that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results. CONCLUSIONS: The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage. Public Library of Science 2009-02-27 /pmc/articles/PMC2645684/ /pubmed/19247487 http://dx.doi.org/10.1371/journal.pone.0004619 Text en Fontana et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Fontana, Paolo
Cestaro, Alessandro
Velasco, Riccardo
Formentin, Elide
Toppo, Stefano
Rapid Annotation of Anonymous Sequences from Genome Projects Using Semantic Similarities and a Weighting Scheme in Gene Ontology
title Rapid Annotation of Anonymous Sequences from Genome Projects Using Semantic Similarities and a Weighting Scheme in Gene Ontology
title_full Rapid Annotation of Anonymous Sequences from Genome Projects Using Semantic Similarities and a Weighting Scheme in Gene Ontology
title_fullStr Rapid Annotation of Anonymous Sequences from Genome Projects Using Semantic Similarities and a Weighting Scheme in Gene Ontology
title_full_unstemmed Rapid Annotation of Anonymous Sequences from Genome Projects Using Semantic Similarities and a Weighting Scheme in Gene Ontology
title_short Rapid Annotation of Anonymous Sequences from Genome Projects Using Semantic Similarities and a Weighting Scheme in Gene Ontology
title_sort rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2645684/
https://www.ncbi.nlm.nih.gov/pubmed/19247487
http://dx.doi.org/10.1371/journal.pone.0004619
work_keys_str_mv AT fontanapaolo rapidannotationofanonymoussequencesfromgenomeprojectsusingsemanticsimilaritiesandaweightingschemeingeneontology
AT cestaroalessandro rapidannotationofanonymoussequencesfromgenomeprojectsusingsemanticsimilaritiesandaweightingschemeingeneontology
AT velascoriccardo rapidannotationofanonymoussequencesfromgenomeprojectsusingsemanticsimilaritiesandaweightingschemeingeneontology
AT formentinelide rapidannotationofanonymoussequencesfromgenomeprojectsusingsemanticsimilaritiesandaweightingschemeingeneontology
AT toppostefano rapidannotationofanonymoussequencesfromgenomeprojectsusingsemanticsimilaritiesandaweightingschemeingeneontology