Cargando…

Automatically extracting functionally equivalent proteins from SwissProt

BACKGROUND: There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The i...

Descripción completa

Detalles Bibliográficos
Autores principales: McMillan, Lisa EM, Martin, Andrew CR
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2576269/
https://www.ncbi.nlm.nih.gov/pubmed/18838004
http://dx.doi.org/10.1186/1471-2105-9-418
_version_ 1782160381793271808
author McMillan, Lisa EM
Martin, Andrew CR
author_facet McMillan, Lisa EM
Martin, Andrew CR
author_sort McMillan, Lisa EM
collection PubMed
description BACKGROUND: There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/Swiss-Prot and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs – for example, all instances of protein C. We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/Swiss-Prot which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach. RESULTS: Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance. CONCLUSION: In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot.
format Text
id pubmed-2576269
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25762692008-10-31 Automatically extracting functionally equivalent proteins from SwissProt McMillan, Lisa EM Martin, Andrew CR BMC Bioinformatics Database BACKGROUND: There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/Swiss-Prot and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs – for example, all instances of protein C. We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/Swiss-Prot which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach. RESULTS: Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance. CONCLUSION: In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot. BioMed Central 2008-10-06 /pmc/articles/PMC2576269/ /pubmed/18838004 http://dx.doi.org/10.1186/1471-2105-9-418 Text en Copyright © 2008 McMillan and Martin; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database
McMillan, Lisa EM
Martin, Andrew CR
Automatically extracting functionally equivalent proteins from SwissProt
title Automatically extracting functionally equivalent proteins from SwissProt
title_full Automatically extracting functionally equivalent proteins from SwissProt
title_fullStr Automatically extracting functionally equivalent proteins from SwissProt
title_full_unstemmed Automatically extracting functionally equivalent proteins from SwissProt
title_short Automatically extracting functionally equivalent proteins from SwissProt
title_sort automatically extracting functionally equivalent proteins from swissprot
topic Database
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2576269/
https://www.ncbi.nlm.nih.gov/pubmed/18838004
http://dx.doi.org/10.1186/1471-2105-9-418
work_keys_str_mv AT mcmillanlisaem automaticallyextractingfunctionallyequivalentproteinsfromswissprot
AT martinandrewcr automaticallyextractingfunctionallyequivalentproteinsfromswissprot