Cargando…

UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB

Motivation: Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention...

Descripción completa

Detalles Bibliográficos
Autores principales: Doğan, Tunca, MacDougall, Alistair, Saidi, Rabie, Poggioli, Diego, Bateman, Alex, O’Donovan, Claire, Martin, Maria J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4965628/
https://www.ncbi.nlm.nih.gov/pubmed/27153729
http://dx.doi.org/10.1093/bioinformatics/btw114
_version_ 1782445282598846464
author Doğan, Tunca
MacDougall, Alistair
Saidi, Rabie
Poggioli, Diego
Bateman, Alex
O’Donovan, Claire
Martin, Maria J.
author_facet Doğan, Tunca
MacDougall, Alistair
Saidi, Rabie
Poggioli, Diego
Bateman, Alex
O’Donovan, Claire
Martin, Maria J.
author_sort Doğan, Tunca
collection PubMed
description Motivation: Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention. One of these novel approaches is to use the organization of the structural domains in proteins. Results: We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a sub-set of UniProtKB/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB/TrEMBL resulted in 44 818 178 GO term predictions for 12 172 114 proteins. 22% of these predictions were for 2 812 016 previously non-annotated protein entries indicating the significance of the value added by this approach. Availability and implementation: The results of the method are available at: ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/. Contact: tdogan@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4965628
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-49656282016-08-01 UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB Doğan, Tunca MacDougall, Alistair Saidi, Rabie Poggioli, Diego Bateman, Alex O’Donovan, Claire Martin, Maria J. Bioinformatics Original Papers Motivation: Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention. One of these novel approaches is to use the organization of the structural domains in proteins. Results: We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a sub-set of UniProtKB/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB/TrEMBL resulted in 44 818 178 GO term predictions for 12 172 114 proteins. 22% of these predictions were for 2 812 016 previously non-annotated protein entries indicating the significance of the value added by this approach. Availability and implementation: The results of the method are available at: ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/. Contact: tdogan@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-08-01 2016-03-07 /pmc/articles/PMC4965628/ /pubmed/27153729 http://dx.doi.org/10.1093/bioinformatics/btw114 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Doğan, Tunca
MacDougall, Alistair
Saidi, Rabie
Poggioli, Diego
Bateman, Alex
O’Donovan, Claire
Martin, Maria J.
UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB
title UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB
title_full UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB
title_fullStr UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB
title_full_unstemmed UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB
title_short UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB
title_sort uniprot-daac: domain architecture alignment and classification, a new method for automatic functional annotation in uniprotkb
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4965628/
https://www.ncbi.nlm.nih.gov/pubmed/27153729
http://dx.doi.org/10.1093/bioinformatics/btw114
work_keys_str_mv AT dogantunca uniprotdaacdomainarchitecturealignmentandclassificationanewmethodforautomaticfunctionalannotationinuniprotkb
AT macdougallalistair uniprotdaacdomainarchitecturealignmentandclassificationanewmethodforautomaticfunctionalannotationinuniprotkb
AT saidirabie uniprotdaacdomainarchitecturealignmentandclassificationanewmethodforautomaticfunctionalannotationinuniprotkb
AT poggiolidiego uniprotdaacdomainarchitecturealignmentandclassificationanewmethodforautomaticfunctionalannotationinuniprotkb
AT batemanalex uniprotdaacdomainarchitecturealignmentandclassificationanewmethodforautomaticfunctionalannotationinuniprotkb
AT odonovanclaire uniprotdaacdomainarchitecturealignmentandclassificationanewmethodforautomaticfunctionalannotationinuniprotkb
AT martinmariaj uniprotdaacdomainarchitecturealignmentandclassificationanewmethodforautomaticfunctionalannotationinuniprotkb