Cargando…

Annotation extension through protein family annotation coherence metrics

Protein functional annotation consists in associating proteins with textual descriptors elucidating their biological roles. The bulk of annotation is done via automated procedures that ultimately rely on annotation transfer. Despite a large number of existing protein annotation procedures the ever g...

Descripción completa

Detalles Bibliográficos
Autores principales: Bastos, Hugo P., Clarke, Luka A., Couto, Francisco M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3795322/
https://www.ncbi.nlm.nih.gov/pubmed/24130572
http://dx.doi.org/10.3389/fgene.2013.00201
_version_ 1782287361777860608
author Bastos, Hugo P.
Clarke, Luka A.
Couto, Francisco M.
author_facet Bastos, Hugo P.
Clarke, Luka A.
Couto, Francisco M.
author_sort Bastos, Hugo P.
collection PubMed
description Protein functional annotation consists in associating proteins with textual descriptors elucidating their biological roles. The bulk of annotation is done via automated procedures that ultimately rely on annotation transfer. Despite a large number of existing protein annotation procedures the ever growing protein space is never completely annotated. One of the facets of annotation incompleteness derives from annotation uncertainty. Often when protein function cannot be predicted with enough specificity it is instead conservatively annotated with more generic terms. In a scenario of protein families or functionally related (or even dissimilar) sets this leads to a more difficult task of using annotations to compare the extent of functional relatedness among all family or set members. However, we postulate that identifying sub-sets of functionally coherent proteins annotated at a very specific level, can help the annotation extension of other incompletely annotated proteins within the same family or functionally related set. As an example we analyse the status of annotation of a set of CAZy families belonging to the Polysaccharide Lyase class. We show that through the use of visualization methods and semantic similarity based metrics it is possible to identify families and respective annotation terms within them that are suitable for possible annotation extension. Based on our analysis we then propose a semi-automatic methodology leading to the extension of single annotation terms within these partially annotated protein sets or families.
format Online
Article
Text
id pubmed-3795322
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-37953222013-10-15 Annotation extension through protein family annotation coherence metrics Bastos, Hugo P. Clarke, Luka A. Couto, Francisco M. Front Genet Genetics Protein functional annotation consists in associating proteins with textual descriptors elucidating their biological roles. The bulk of annotation is done via automated procedures that ultimately rely on annotation transfer. Despite a large number of existing protein annotation procedures the ever growing protein space is never completely annotated. One of the facets of annotation incompleteness derives from annotation uncertainty. Often when protein function cannot be predicted with enough specificity it is instead conservatively annotated with more generic terms. In a scenario of protein families or functionally related (or even dissimilar) sets this leads to a more difficult task of using annotations to compare the extent of functional relatedness among all family or set members. However, we postulate that identifying sub-sets of functionally coherent proteins annotated at a very specific level, can help the annotation extension of other incompletely annotated proteins within the same family or functionally related set. As an example we analyse the status of annotation of a set of CAZy families belonging to the Polysaccharide Lyase class. We show that through the use of visualization methods and semantic similarity based metrics it is possible to identify families and respective annotation terms within them that are suitable for possible annotation extension. Based on our analysis we then propose a semi-automatic methodology leading to the extension of single annotation terms within these partially annotated protein sets or families. Frontiers Media S.A. 2013-10-11 /pmc/articles/PMC3795322/ /pubmed/24130572 http://dx.doi.org/10.3389/fgene.2013.00201 Text en Copyright © 2013 Bastos, Clarke and Couto. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Bastos, Hugo P.
Clarke, Luka A.
Couto, Francisco M.
Annotation extension through protein family annotation coherence metrics
title Annotation extension through protein family annotation coherence metrics
title_full Annotation extension through protein family annotation coherence metrics
title_fullStr Annotation extension through protein family annotation coherence metrics
title_full_unstemmed Annotation extension through protein family annotation coherence metrics
title_short Annotation extension through protein family annotation coherence metrics
title_sort annotation extension through protein family annotation coherence metrics
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3795322/
https://www.ncbi.nlm.nih.gov/pubmed/24130572
http://dx.doi.org/10.3389/fgene.2013.00201
work_keys_str_mv AT bastoshugop annotationextensionthroughproteinfamilyannotationcoherencemetrics
AT clarkelukaa annotationextensionthroughproteinfamilyannotationcoherencemetrics
AT coutofranciscom annotationextensionthroughproteinfamilyannotationcoherencemetrics