Cargando…

The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases

BACKGROUND: Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated wit...

Descripción completa

Detalles Bibliográficos
Autores principales: Côté, Richard G, Jones, Philip, Martens, Lennart, Kerrien, Samuel, Reisinger, Florian, Lin, Quan, Leinonen, Rasko, Apweiler, Rolf, Hermjakob, Henning
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2151082/
https://www.ncbi.nlm.nih.gov/pubmed/17945017
http://dx.doi.org/10.1186/1471-2105-8-401
_version_ 1782144692254670848
author Côté, Richard G
Jones, Philip
Martens, Lennart
Kerrien, Samuel
Reisinger, Florian
Lin, Quan
Leinonen, Rasko
Apweiler, Rolf
Hermjakob, Henning
author_facet Côté, Richard G
Jones, Philip
Martens, Lennart
Kerrien, Samuel
Reisinger, Florian
Lin, Quan
Leinonen, Rasko
Apweiler, Rolf
Hermjakob, Henning
author_sort Côté, Richard G
collection PubMed
description BACKGROUND: Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. RESULTS: We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. CONCLUSION: We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at .
format Text
id pubmed-2151082
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-21510822007-12-21 The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases Côté, Richard G Jones, Philip Martens, Lennart Kerrien, Samuel Reisinger, Florian Lin, Quan Leinonen, Rasko Apweiler, Rolf Hermjakob, Henning BMC Bioinformatics Software BACKGROUND: Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. RESULTS: We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. CONCLUSION: We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at . BioMed Central 2007-10-18 /pmc/articles/PMC2151082/ /pubmed/17945017 http://dx.doi.org/10.1186/1471-2105-8-401 Text en Copyright © 2007 Côté et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Côté, Richard G
Jones, Philip
Martens, Lennart
Kerrien, Samuel
Reisinger, Florian
Lin, Quan
Leinonen, Rasko
Apweiler, Rolf
Hermjakob, Henning
The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases
title The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases
title_full The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases
title_fullStr The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases
title_full_unstemmed The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases
title_short The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases
title_sort protein identifier cross-referencing (picr) service: reconciling protein identifiers across multiple source databases
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2151082/
https://www.ncbi.nlm.nih.gov/pubmed/17945017
http://dx.doi.org/10.1186/1471-2105-8-401
work_keys_str_mv AT coterichardg theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT jonesphilip theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT martenslennart theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT kerriensamuel theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT reisingerflorian theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT linquan theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT leinonenrasko theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT apweilerrolf theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT hermjakobhenning theproteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT coterichardg proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT jonesphilip proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT martenslennart proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT kerriensamuel proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT reisingerflorian proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT linquan proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT leinonenrasko proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT apweilerrolf proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases
AT hermjakobhenning proteinidentifiercrossreferencingpicrservicereconcilingproteinidentifiersacrossmultiplesourcedatabases