Cargando…

Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation

BACKGROUND: With the development of genome-sequencing technologies, protein sequences are readily obtained by translating the measured mRNAs. Therefore predicting protein-protein interactions from the sequences is of great demand. The reason lies in the fact that identifying protein-protein interact...

Descripción completa

Detalles Bibliográficos
Autores principales: Ren, Xianwen, Wang, Yong-Cui, Wang, Yong, Zhang, Xiang-Sun, Deng, Nai-Yang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3215753/
https://www.ncbi.nlm.nih.gov/pubmed/22024143
http://dx.doi.org/10.1186/1471-2105-12-409
_version_ 1782216434541133824
author Ren, Xianwen
Wang, Yong-Cui
Wang, Yong
Zhang, Xiang-Sun
Deng, Nai-Yang
author_facet Ren, Xianwen
Wang, Yong-Cui
Wang, Yong
Zhang, Xiang-Sun
Deng, Nai-Yang
author_sort Ren, Xianwen
collection PubMed
description BACKGROUND: With the development of genome-sequencing technologies, protein sequences are readily obtained by translating the measured mRNAs. Therefore predicting protein-protein interactions from the sequences is of great demand. The reason lies in the fact that identifying protein-protein interactions is becoming a bottleneck for eventually understanding the functions of proteins, especially for those organisms barely characterized. Although a few methods have been proposed, the converse problem, if the features used extract sufficient and unbiased information from protein sequences, is almost untouched. RESULTS: In this study, we interrogate this problem theoretically by an optimization scheme. Motivated by the theoretical investigation, we find novel encoding methods for both protein sequences and protein pairs. Our new methods exploit sufficiently the information of protein sequences and reduce artificial bias and computational cost. Thus, it significantly outperforms the available methods regarding sensitivity, specificity, precision, and recall with cross-validation evaluation and reaches ~80% and ~90% accuracy in Escherichia coli and Saccharomyces cerevisiae respectively. Our findings here hold important implication for other sequence-based prediction tasks because representation of biological sequence is always the first step in computational biology. CONCLUSIONS: By considering the converse problem, we propose new representation methods for both protein sequences and protein pairs. The results show that our method significantly improves the accuracy of protein-protein interaction predictions.
format Online
Article
Text
id pubmed-3215753
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32157532011-11-15 Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation Ren, Xianwen Wang, Yong-Cui Wang, Yong Zhang, Xiang-Sun Deng, Nai-Yang BMC Bioinformatics Research Article BACKGROUND: With the development of genome-sequencing technologies, protein sequences are readily obtained by translating the measured mRNAs. Therefore predicting protein-protein interactions from the sequences is of great demand. The reason lies in the fact that identifying protein-protein interactions is becoming a bottleneck for eventually understanding the functions of proteins, especially for those organisms barely characterized. Although a few methods have been proposed, the converse problem, if the features used extract sufficient and unbiased information from protein sequences, is almost untouched. RESULTS: In this study, we interrogate this problem theoretically by an optimization scheme. Motivated by the theoretical investigation, we find novel encoding methods for both protein sequences and protein pairs. Our new methods exploit sufficiently the information of protein sequences and reduce artificial bias and computational cost. Thus, it significantly outperforms the available methods regarding sensitivity, specificity, precision, and recall with cross-validation evaluation and reaches ~80% and ~90% accuracy in Escherichia coli and Saccharomyces cerevisiae respectively. Our findings here hold important implication for other sequence-based prediction tasks because representation of biological sequence is always the first step in computational biology. CONCLUSIONS: By considering the converse problem, we propose new representation methods for both protein sequences and protein pairs. The results show that our method significantly improves the accuracy of protein-protein interaction predictions. BioMed Central 2011-10-24 /pmc/articles/PMC3215753/ /pubmed/22024143 http://dx.doi.org/10.1186/1471-2105-12-409 Text en Copyright ©2011 Ren et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Ren, Xianwen
Wang, Yong-Cui
Wang, Yong
Zhang, Xiang-Sun
Deng, Nai-Yang
Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation
title Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation
title_full Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation
title_fullStr Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation
title_full_unstemmed Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation
title_short Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation
title_sort improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3215753/
https://www.ncbi.nlm.nih.gov/pubmed/22024143
http://dx.doi.org/10.1186/1471-2105-12-409
work_keys_str_mv AT renxianwen improvingaccuracyofproteinproteininteractionpredictionbyconsideringtheconverseproblemforsequencerepresentation
AT wangyongcui improvingaccuracyofproteinproteininteractionpredictionbyconsideringtheconverseproblemforsequencerepresentation
AT wangyong improvingaccuracyofproteinproteininteractionpredictionbyconsideringtheconverseproblemforsequencerepresentation
AT zhangxiangsun improvingaccuracyofproteinproteininteractionpredictionbyconsideringtheconverseproblemforsequencerepresentation
AT dengnaiyang improvingaccuracyofproteinproteininteractionpredictionbyconsideringtheconverseproblemforsequencerepresentation