Cargando…

Protein-Protein Interactions: Gene Acronym Redundancies and Current Limitations Precluding Automated Data Integration

Understanding protein interaction networks and their dynamic changes is a major challenge in modern biology. Currently, several experimental and in silico approaches allow the screening of protein interactors in a large-scale manner. Therefore, the bulk of information on protein interactions deposit...

Descripción completa

Detalles Bibliográficos
Autores principales: Casado-Vela, Juan, Matthiesen, Rune, Sellés, Susana, Naranjo, José Ramón
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5314489/
https://www.ncbi.nlm.nih.gov/pubmed/28250396
http://dx.doi.org/10.3390/proteomes1010003
_version_ 1782508529911857152
author Casado-Vela, Juan
Matthiesen, Rune
Sellés, Susana
Naranjo, José Ramón
author_facet Casado-Vela, Juan
Matthiesen, Rune
Sellés, Susana
Naranjo, José Ramón
author_sort Casado-Vela, Juan
collection PubMed
description Understanding protein interaction networks and their dynamic changes is a major challenge in modern biology. Currently, several experimental and in silico approaches allow the screening of protein interactors in a large-scale manner. Therefore, the bulk of information on protein interactions deposited in databases and peer-reviewed published literature is constantly growing. Multiple databases interfaced from user-friendly web tools recently emerged to facilitate the task of protein interaction data retrieval and data integration. Nevertheless, as we evidence in this report, despite the current efforts towards data integration, the quality of the information on protein interactions retrieved by in silico approaches is frequently incomplete and may even list false interactions. Here we point to some obstacles precluding confident data integration, with special emphasis on protein interactions, which include gene acronym redundancies and protein synonyms. Three human proteins (choline kinase, PPIase and uromodulin) and three different web-based data search engines focused on protein interaction data retrieval (PSICQUIC, DASMI and BIPS) were used to explain the potential occurrence of undesired errors that should be considered by researchers in the field. We demonstrate that, despite the recent initiatives towards data standardization, manual curation of protein interaction networks based on literature searches are still required to remove potential false positives. A three-step workflow consisting of: (i) data retrieval from multiple databases, (ii) peer-reviewed literature searches, and (iii) data curation and integration, is proposed as the best strategy to gather updated information on protein interactions. Finally, this strategy was applied to compile bona fide information on human DREAM protein interactome, which constitutes liable training datasets that can be used to improve computational predictions.
format Online
Article
Text
id pubmed-5314489
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-53144892017-02-27 Protein-Protein Interactions: Gene Acronym Redundancies and Current Limitations Precluding Automated Data Integration Casado-Vela, Juan Matthiesen, Rune Sellés, Susana Naranjo, José Ramón Proteomes Article Understanding protein interaction networks and their dynamic changes is a major challenge in modern biology. Currently, several experimental and in silico approaches allow the screening of protein interactors in a large-scale manner. Therefore, the bulk of information on protein interactions deposited in databases and peer-reviewed published literature is constantly growing. Multiple databases interfaced from user-friendly web tools recently emerged to facilitate the task of protein interaction data retrieval and data integration. Nevertheless, as we evidence in this report, despite the current efforts towards data integration, the quality of the information on protein interactions retrieved by in silico approaches is frequently incomplete and may even list false interactions. Here we point to some obstacles precluding confident data integration, with special emphasis on protein interactions, which include gene acronym redundancies and protein synonyms. Three human proteins (choline kinase, PPIase and uromodulin) and three different web-based data search engines focused on protein interaction data retrieval (PSICQUIC, DASMI and BIPS) were used to explain the potential occurrence of undesired errors that should be considered by researchers in the field. We demonstrate that, despite the recent initiatives towards data standardization, manual curation of protein interaction networks based on literature searches are still required to remove potential false positives. A three-step workflow consisting of: (i) data retrieval from multiple databases, (ii) peer-reviewed literature searches, and (iii) data curation and integration, is proposed as the best strategy to gather updated information on protein interactions. Finally, this strategy was applied to compile bona fide information on human DREAM protein interactome, which constitutes liable training datasets that can be used to improve computational predictions. MDPI 2013-05-31 /pmc/articles/PMC5314489/ /pubmed/28250396 http://dx.doi.org/10.3390/proteomes1010003 Text en © 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle Article
Casado-Vela, Juan
Matthiesen, Rune
Sellés, Susana
Naranjo, José Ramón
Protein-Protein Interactions: Gene Acronym Redundancies and Current Limitations Precluding Automated Data Integration
title Protein-Protein Interactions: Gene Acronym Redundancies and Current Limitations Precluding Automated Data Integration
title_full Protein-Protein Interactions: Gene Acronym Redundancies and Current Limitations Precluding Automated Data Integration
title_fullStr Protein-Protein Interactions: Gene Acronym Redundancies and Current Limitations Precluding Automated Data Integration
title_full_unstemmed Protein-Protein Interactions: Gene Acronym Redundancies and Current Limitations Precluding Automated Data Integration
title_short Protein-Protein Interactions: Gene Acronym Redundancies and Current Limitations Precluding Automated Data Integration
title_sort protein-protein interactions: gene acronym redundancies and current limitations precluding automated data integration
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5314489/
https://www.ncbi.nlm.nih.gov/pubmed/28250396
http://dx.doi.org/10.3390/proteomes1010003
work_keys_str_mv AT casadovelajuan proteinproteininteractionsgeneacronymredundanciesandcurrentlimitationsprecludingautomateddataintegration
AT matthiesenrune proteinproteininteractionsgeneacronymredundanciesandcurrentlimitationsprecludingautomateddataintegration
AT sellessusana proteinproteininteractionsgeneacronymredundanciesandcurrentlimitationsprecludingautomateddataintegration
AT naranjojoseramon proteinproteininteractionsgeneacronymredundanciesandcurrentlimitationsprecludingautomateddataintegration