Cargando…

How to inherit statistically validated annotation within BAR+ protein clusters

BACKGROUND: In the genomic era a key issue is protein annotation, namely how to endow protein sequences, upon translation from the corresponding genes, with structural and functional features. Routinely this operation is electronically done by deriving and integrating information from previous knowl...

Descripción completa

Detalles Bibliográficos
Autores principales: Piovesan, Damiano, Martelli, Pier Luigi, Fariselli, Piero, Profiti, Giuseppe, Zauli, Andrea, Rossi, Ivan, Casadio, Rita
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3584929/
https://www.ncbi.nlm.nih.gov/pubmed/23514411
http://dx.doi.org/10.1186/1471-2105-14-S3-S4
_version_ 1782261079617830912
author Piovesan, Damiano
Martelli, Pier Luigi
Fariselli, Piero
Profiti, Giuseppe
Zauli, Andrea
Rossi, Ivan
Casadio, Rita
author_facet Piovesan, Damiano
Martelli, Pier Luigi
Fariselli, Piero
Profiti, Giuseppe
Zauli, Andrea
Rossi, Ivan
Casadio, Rita
author_sort Piovesan, Damiano
collection PubMed
description BACKGROUND: In the genomic era a key issue is protein annotation, namely how to endow protein sequences, upon translation from the corresponding genes, with structural and functional features. Routinely this operation is electronically done by deriving and integrating information from previous knowledge. The reference database for protein sequences is UniProtKB divided into two sections, UniProtKB/TrEMBL which is automatically annotated and not reviewed and UniProtKB/Swiss-Prot which is manually annotated and reviewed. The annotation process is essentially based on sequence similarity search. The question therefore arises as to which extent annotation based on transfer by inheritance is valuable and specifically if it is possible to statistically validate inherited features when little homology exists among the target sequence and its template(s). RESULTS: In this paper we address the problem of annotating protein sequences in a statistically validated manner considering as a reference annotation resource UniProtKB. The test case is the set of 48,298 proteins recently released by the Critical Assessment of Function Annotations (CAFA) organization. We show that we can transfer after validation, Gene Ontology (GO) terms of the three main categories and Pfam domains to about 68% and 72% of the sequences, respectively. This is possible after alignment of the CAFA sequences towards BAR+, our annotation resource that allows discriminating among statistically validated and not statistically validated annotation. By comparing with a direct UniProtKB annotation, we find that besides validating annotation of some 78% of the CAFA set, we assign new and statistically validated annotation to 14.8% of the sequences and find new structural templates for about 25% of the chains, half of which share less than 30% sequence identity to the corresponding template/s. CONCLUSION: Inheritance of annotation by transfer generally requires a careful selection of the identity value among the target and the template in order to transfer structural and/or functional features. Here we prove that even distantly remote homologs can be safely endowed with structural templates and GO and/or Pfam terms provided that annotation is done within clusters collecting cluster-related protein sequences and where a statistical validation of the shared structural and functional features is possible.
format Online
Article
Text
id pubmed-3584929
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35849292013-03-11 How to inherit statistically validated annotation within BAR+ protein clusters Piovesan, Damiano Martelli, Pier Luigi Fariselli, Piero Profiti, Giuseppe Zauli, Andrea Rossi, Ivan Casadio, Rita BMC Bioinformatics Proceedings BACKGROUND: In the genomic era a key issue is protein annotation, namely how to endow protein sequences, upon translation from the corresponding genes, with structural and functional features. Routinely this operation is electronically done by deriving and integrating information from previous knowledge. The reference database for protein sequences is UniProtKB divided into two sections, UniProtKB/TrEMBL which is automatically annotated and not reviewed and UniProtKB/Swiss-Prot which is manually annotated and reviewed. The annotation process is essentially based on sequence similarity search. The question therefore arises as to which extent annotation based on transfer by inheritance is valuable and specifically if it is possible to statistically validate inherited features when little homology exists among the target sequence and its template(s). RESULTS: In this paper we address the problem of annotating protein sequences in a statistically validated manner considering as a reference annotation resource UniProtKB. The test case is the set of 48,298 proteins recently released by the Critical Assessment of Function Annotations (CAFA) organization. We show that we can transfer after validation, Gene Ontology (GO) terms of the three main categories and Pfam domains to about 68% and 72% of the sequences, respectively. This is possible after alignment of the CAFA sequences towards BAR+, our annotation resource that allows discriminating among statistically validated and not statistically validated annotation. By comparing with a direct UniProtKB annotation, we find that besides validating annotation of some 78% of the CAFA set, we assign new and statistically validated annotation to 14.8% of the sequences and find new structural templates for about 25% of the chains, half of which share less than 30% sequence identity to the corresponding template/s. CONCLUSION: Inheritance of annotation by transfer generally requires a careful selection of the identity value among the target and the template in order to transfer structural and/or functional features. Here we prove that even distantly remote homologs can be safely endowed with structural templates and GO and/or Pfam terms provided that annotation is done within clusters collecting cluster-related protein sequences and where a statistical validation of the shared structural and functional features is possible. BioMed Central 2013-02-28 /pmc/articles/PMC3584929/ /pubmed/23514411 http://dx.doi.org/10.1186/1471-2105-14-S3-S4 Text en Copyright ©2013 Piovesan et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Piovesan, Damiano
Martelli, Pier Luigi
Fariselli, Piero
Profiti, Giuseppe
Zauli, Andrea
Rossi, Ivan
Casadio, Rita
How to inherit statistically validated annotation within BAR+ protein clusters
title How to inherit statistically validated annotation within BAR+ protein clusters
title_full How to inherit statistically validated annotation within BAR+ protein clusters
title_fullStr How to inherit statistically validated annotation within BAR+ protein clusters
title_full_unstemmed How to inherit statistically validated annotation within BAR+ protein clusters
title_short How to inherit statistically validated annotation within BAR+ protein clusters
title_sort how to inherit statistically validated annotation within bar+ protein clusters
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3584929/
https://www.ncbi.nlm.nih.gov/pubmed/23514411
http://dx.doi.org/10.1186/1471-2105-14-S3-S4
work_keys_str_mv AT piovesandamiano howtoinheritstatisticallyvalidatedannotationwithinbarproteinclusters
AT martellipierluigi howtoinheritstatisticallyvalidatedannotationwithinbarproteinclusters
AT farisellipiero howtoinheritstatisticallyvalidatedannotationwithinbarproteinclusters
AT profitigiuseppe howtoinheritstatisticallyvalidatedannotationwithinbarproteinclusters
AT zauliandrea howtoinheritstatisticallyvalidatedannotationwithinbarproteinclusters
AT rossiivan howtoinheritstatisticallyvalidatedannotationwithinbarproteinclusters
AT casadiorita howtoinheritstatisticallyvalidatedannotationwithinbarproteinclusters