Cargando…

PIPENN: protein interface prediction from sequence with an ensemble of neural nets

MOTIVATION: The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational...

Descripción completa

Detalles Bibliográficos
Autores principales:	Stringer, Bas, de Ferrante, Hans, Abeln, Sanne, Heringa, Jaap, Feenstra, K Anton, Haydarlou, Reza
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9004643/ https://www.ncbi.nlm.nih.gov/pubmed/35150231 http://dx.doi.org/10.1093/bioinformatics/btac071

_version_	1784686308446175232
author	Stringer, Bas de Ferrante, Hans Abeln, Sanne Heringa, Jaap Feenstra, K Anton Haydarlou, Reza
author_facet	Stringer, Bas de Ferrante, Hans Abeln, Sanne Heringa, Jaap Feenstra, K Anton Haydarlou, Reza
author_sort	Stringer, Bas
collection	PubMed
description	MOTIVATION: The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational and machine learning approaches have been developed over the years to predict such interface residues from sequence. However, the effectiveness of different Deep Learning (DL) architectures and learning strategies for protein–protein, protein–nucleotide and protein–small molecule interface prediction has not yet been investigated in great detail. Therefore, we here explore the prediction of protein interface residues using six DL architectures and various learning strategies with sequence-derived input features. RESULTS: We constructed a large dataset dubbed BioDL, comprising protein–protein interactions from the PDB, and DNA/RNA and small molecule interactions from the BioLip database. We also constructed six DL architectures, and evaluated them on the BioDL benchmarks. This shows that no single architecture performs best on all instances. An ensemble architecture, which combines all six architectures, does consistently achieve peak prediction accuracy. We confirmed these results on the published benchmark set by Zhang and Kurgan (ZK448), and on our own existing curated homo- and heteromeric protein interaction dataset. Our PIPENN sequence-based ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on ZK448 on all interaction types, achieving an AUC-ROC of 0.718 for protein–protein, 0.823 for protein–nucleotide and 0.842 for protein–small molecule. AVAILABILITY AND IMPLEMENTATION: Source code and datasets are available at https://github.com/ibivu/pipenn/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-9004643
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-90046432022-04-13 PIPENN: protein interface prediction from sequence with an ensemble of neural nets Stringer, Bas de Ferrante, Hans Abeln, Sanne Heringa, Jaap Feenstra, K Anton Haydarlou, Reza Bioinformatics Original Papers MOTIVATION: The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational and machine learning approaches have been developed over the years to predict such interface residues from sequence. However, the effectiveness of different Deep Learning (DL) architectures and learning strategies for protein–protein, protein–nucleotide and protein–small molecule interface prediction has not yet been investigated in great detail. Therefore, we here explore the prediction of protein interface residues using six DL architectures and various learning strategies with sequence-derived input features. RESULTS: We constructed a large dataset dubbed BioDL, comprising protein–protein interactions from the PDB, and DNA/RNA and small molecule interactions from the BioLip database. We also constructed six DL architectures, and evaluated them on the BioDL benchmarks. This shows that no single architecture performs best on all instances. An ensemble architecture, which combines all six architectures, does consistently achieve peak prediction accuracy. We confirmed these results on the published benchmark set by Zhang and Kurgan (ZK448), and on our own existing curated homo- and heteromeric protein interaction dataset. Our PIPENN sequence-based ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on ZK448 on all interaction types, achieving an AUC-ROC of 0.718 for protein–protein, 0.823 for protein–nucleotide and 0.842 for protein–small molecule. AVAILABILITY AND IMPLEMENTATION: Source code and datasets are available at https://github.com/ibivu/pipenn/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-02-12 /pmc/articles/PMC9004643/ /pubmed/35150231 http://dx.doi.org/10.1093/bioinformatics/btac071 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Original Papers Stringer, Bas de Ferrante, Hans Abeln, Sanne Heringa, Jaap Feenstra, K Anton Haydarlou, Reza PIPENN: protein interface prediction from sequence with an ensemble of neural nets
title	PIPENN: protein interface prediction from sequence with an ensemble of neural nets
title_full	PIPENN: protein interface prediction from sequence with an ensemble of neural nets
title_fullStr	PIPENN: protein interface prediction from sequence with an ensemble of neural nets
title_full_unstemmed	PIPENN: protein interface prediction from sequence with an ensemble of neural nets
title_short	PIPENN: protein interface prediction from sequence with an ensemble of neural nets
title_sort	pipenn: protein interface prediction from sequence with an ensemble of neural nets
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9004643/ https://www.ncbi.nlm.nih.gov/pubmed/35150231 http://dx.doi.org/10.1093/bioinformatics/btac071
work_keys_str_mv	AT stringerbas pipennproteininterfacepredictionfromsequencewithanensembleofneuralnets AT deferrantehans pipennproteininterfacepredictionfromsequencewithanensembleofneuralnets AT abelnsanne pipennproteininterfacepredictionfromsequencewithanensembleofneuralnets AT heringajaap pipennproteininterfacepredictionfromsequencewithanensembleofneuralnets AT feenstrakanton pipennproteininterfacepredictionfromsequencewithanensembleofneuralnets AT haydarloureza pipennproteininterfacepredictionfromsequencewithanensembleofneuralnets

PIPENN: protein interface prediction from sequence with an ensemble of neural nets

Ejemplares similares