Cargando…
Automatically extracting functionally equivalent proteins from SwissProt
BACKGROUND: There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The i...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2576269/ https://www.ncbi.nlm.nih.gov/pubmed/18838004 http://dx.doi.org/10.1186/1471-2105-9-418 |
_version_ | 1782160381793271808 |
---|---|
author | McMillan, Lisa EM Martin, Andrew CR |
author_facet | McMillan, Lisa EM Martin, Andrew CR |
author_sort | McMillan, Lisa EM |
collection | PubMed |
description | BACKGROUND: There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/Swiss-Prot and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs – for example, all instances of protein C. We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/Swiss-Prot which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach. RESULTS: Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance. CONCLUSION: In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot. |
format | Text |
id | pubmed-2576269 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-25762692008-10-31 Automatically extracting functionally equivalent proteins from SwissProt McMillan, Lisa EM Martin, Andrew CR BMC Bioinformatics Database BACKGROUND: There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/Swiss-Prot and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs – for example, all instances of protein C. We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/Swiss-Prot which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach. RESULTS: Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance. CONCLUSION: In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot. BioMed Central 2008-10-06 /pmc/articles/PMC2576269/ /pubmed/18838004 http://dx.doi.org/10.1186/1471-2105-9-418 Text en Copyright © 2008 McMillan and Martin; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Database McMillan, Lisa EM Martin, Andrew CR Automatically extracting functionally equivalent proteins from SwissProt |
title | Automatically extracting functionally equivalent proteins from SwissProt |
title_full | Automatically extracting functionally equivalent proteins from SwissProt |
title_fullStr | Automatically extracting functionally equivalent proteins from SwissProt |
title_full_unstemmed | Automatically extracting functionally equivalent proteins from SwissProt |
title_short | Automatically extracting functionally equivalent proteins from SwissProt |
title_sort | automatically extracting functionally equivalent proteins from swissprot |
topic | Database |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2576269/ https://www.ncbi.nlm.nih.gov/pubmed/18838004 http://dx.doi.org/10.1186/1471-2105-9-418 |
work_keys_str_mv | AT mcmillanlisaem automaticallyextractingfunctionallyequivalentproteinsfromswissprot AT martinandrewcr automaticallyextractingfunctionallyequivalentproteinsfromswissprot |