Cargando…

Improving compound–protein interaction prediction by building up highly credible negative samples

Motivation: Computational prediction of compound–protein interactions (CPIs) is of great importance for drug design and development, as genome-scale experimental validation of CPIs is not only time-consuming but also prohibitively expensive. With the availability of an increasing number of validated...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Hui, Sun, Jianjiang, Guan, Jihong, Zheng, Jie, Zhou, Shuigeng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2015
Materias:	Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765858/ https://www.ncbi.nlm.nih.gov/pubmed/26072486 http://dx.doi.org/10.1093/bioinformatics/btv256

_version_	1782417583844098048
author	Liu, Hui Sun, Jianjiang Guan, Jihong Zheng, Jie Zhou, Shuigeng
author_facet	Liu, Hui Sun, Jianjiang Guan, Jihong Zheng, Jie Zhou, Shuigeng
author_sort	Liu, Hui
collection	PubMed
description	Motivation: Computational prediction of compound–protein interactions (CPIs) is of great importance for drug design and development, as genome-scale experimental validation of CPIs is not only time-consuming but also prohibitively expensive. With the availability of an increasing number of validated interactions, the performance of computational prediction approaches is severely impended by the lack of reliable negative CPI samples. A systematic method of screening reliable negative sample becomes critical to improving the performance of in silico prediction methods. Results: This article aims at building up a set of highly credible negative samples of CPIs via an in silico screening method. As most existing computational models assume that similar compounds are likely to interact with similar target proteins and achieve remarkable performance, it is rational to identify potential negative samples based on the converse negative proposition that the proteins dissimilar to every known/predicted target of a compound are not much likely to be targeted by the compound and vice versa. We integrated various resources, including chemical structures, chemical expression profiles and side effects of compounds, amino acid sequences, protein–protein interaction network and functional annotations of proteins, into a systematic screening framework. We first tested the screened negative samples on six classical classifiers, and all these classifiers achieved remarkably higher performance on our negative samples than on randomly generated negative samples for both human and Caenorhabditis elegans. We then verified the negative samples on three existing prediction models, including bipartite local model, Gaussian kernel profile and Bayesian matrix factorization, and found that the performances of these models are also significantly improved on the screened negative samples. Moreover, we validated the screened negative samples on a drug bioactivity dataset. Finally, we derived two sets of new interactions by training an support vector machine classifier on the positive interactions annotated in DrugBank and our screened negative interactions. The screened negative samples and the predicted interactions provide the research community with a useful resource for identifying new drug targets and a helpful supplement to the current curated compound–protein databases. Availability: Supplementary files are available at: http://admis.fudan.edu.cn/negative-cpi/. Contact: sgzhou@fudan.edu.cn Supplementary Information: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-4765858
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-47658582016-03-04 Improving compound–protein interaction prediction by building up highly credible negative samples Liu, Hui Sun, Jianjiang Guan, Jihong Zheng, Jie Zhou, Shuigeng Bioinformatics Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland Motivation: Computational prediction of compound–protein interactions (CPIs) is of great importance for drug design and development, as genome-scale experimental validation of CPIs is not only time-consuming but also prohibitively expensive. With the availability of an increasing number of validated interactions, the performance of computational prediction approaches is severely impended by the lack of reliable negative CPI samples. A systematic method of screening reliable negative sample becomes critical to improving the performance of in silico prediction methods. Results: This article aims at building up a set of highly credible negative samples of CPIs via an in silico screening method. As most existing computational models assume that similar compounds are likely to interact with similar target proteins and achieve remarkable performance, it is rational to identify potential negative samples based on the converse negative proposition that the proteins dissimilar to every known/predicted target of a compound are not much likely to be targeted by the compound and vice versa. We integrated various resources, including chemical structures, chemical expression profiles and side effects of compounds, amino acid sequences, protein–protein interaction network and functional annotations of proteins, into a systematic screening framework. We first tested the screened negative samples on six classical classifiers, and all these classifiers achieved remarkably higher performance on our negative samples than on randomly generated negative samples for both human and Caenorhabditis elegans. We then verified the negative samples on three existing prediction models, including bipartite local model, Gaussian kernel profile and Bayesian matrix factorization, and found that the performances of these models are also significantly improved on the screened negative samples. Moreover, we validated the screened negative samples on a drug bioactivity dataset. Finally, we derived two sets of new interactions by training an support vector machine classifier on the positive interactions annotated in DrugBank and our screened negative interactions. The screened negative samples and the predicted interactions provide the research community with a useful resource for identifying new drug targets and a helpful supplement to the current curated compound–protein databases. Availability: Supplementary files are available at: http://admis.fudan.edu.cn/negative-cpi/. Contact: sgzhou@fudan.edu.cn Supplementary Information: Supplementary data are available at Bioinformatics online. Oxford University Press 2015-06-15 2015-06-10 /pmc/articles/PMC4765858/ /pubmed/26072486 http://dx.doi.org/10.1093/bioinformatics/btv256 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland Liu, Hui Sun, Jianjiang Guan, Jihong Zheng, Jie Zhou, Shuigeng Improving compound–protein interaction prediction by building up highly credible negative samples
title	Improving compound–protein interaction prediction by building up highly credible negative samples
title_full	Improving compound–protein interaction prediction by building up highly credible negative samples
title_fullStr	Improving compound–protein interaction prediction by building up highly credible negative samples
title_full_unstemmed	Improving compound–protein interaction prediction by building up highly credible negative samples
title_short	Improving compound–protein interaction prediction by building up highly credible negative samples
title_sort	improving compound–protein interaction prediction by building up highly credible negative samples
topic	Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765858/ https://www.ncbi.nlm.nih.gov/pubmed/26072486 http://dx.doi.org/10.1093/bioinformatics/btv256
work_keys_str_mv	AT liuhui improvingcompoundproteininteractionpredictionbybuildinguphighlycrediblenegativesamples AT sunjianjiang improvingcompoundproteininteractionpredictionbybuildinguphighlycrediblenegativesamples AT guanjihong improvingcompoundproteininteractionpredictionbybuildinguphighlycrediblenegativesamples AT zhengjie improvingcompoundproteininteractionpredictionbybuildinguphighlycrediblenegativesamples AT zhoushuigeng improvingcompoundproteininteractionpredictionbybuildinguphighlycrediblenegativesamples

Improving compound–protein interaction prediction by building up highly credible negative samples

Ejemplares similares