Cargando…

Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae

Accurate identification of protein–protein interactions (PPI) is the key step in understanding proteins’ biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zubek, Julian, Tatjewski, Marcin, Boniecki, Adam, Mnich, Maciej, Basu, Subhadip, Plewczynski, Dariusz
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2015
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4493684/ https://www.ncbi.nlm.nih.gov/pubmed/26157620 http://dx.doi.org/10.7717/peerj.1041

_version_	1782379964883009536
author	Zubek, Julian Tatjewski, Marcin Boniecki, Adam Mnich, Maciej Basu, Subhadip Plewczynski, Dariusz
author_facet	Zubek, Julian Tatjewski, Marcin Boniecki, Adam Mnich, Maciej Basu, Subhadip Plewczynski, Dariusz
author_sort	Zubek, Julian
collection	PubMed
description	Accurate identification of protein–protein interactions (PPI) is the key step in understanding proteins’ biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein–protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein–protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).
format	Online Article Text
id	pubmed-4493684
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-44936842015-07-08 Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae Zubek, Julian Tatjewski, Marcin Boniecki, Adam Mnich, Maciej Basu, Subhadip Plewczynski, Dariusz PeerJ Bioinformatics Accurate identification of protein–protein interactions (PPI) is the key step in understanding proteins’ biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein–protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein–protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent). PeerJ Inc. 2015-07-02 /pmc/articles/PMC4493684/ /pubmed/26157620 http://dx.doi.org/10.7717/peerj.1041 Text en © 2015 Zubek et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Bioinformatics Zubek, Julian Tatjewski, Marcin Boniecki, Adam Mnich, Maciej Basu, Subhadip Plewczynski, Dariusz Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae
title	Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae
title_full	Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae
title_fullStr	Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae
title_full_unstemmed	Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae
title_short	Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae
title_sort	multi-level machine learning prediction of protein–protein interactions in saccharomyces cerevisiae
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4493684/ https://www.ncbi.nlm.nih.gov/pubmed/26157620 http://dx.doi.org/10.7717/peerj.1041
work_keys_str_mv	AT zubekjulian multilevelmachinelearningpredictionofproteinproteininteractionsinsaccharomycescerevisiae AT tatjewskimarcin multilevelmachinelearningpredictionofproteinproteininteractionsinsaccharomycescerevisiae AT bonieckiadam multilevelmachinelearningpredictionofproteinproteininteractionsinsaccharomycescerevisiae AT mnichmaciej multilevelmachinelearningpredictionofproteinproteininteractionsinsaccharomycescerevisiae AT basusubhadip multilevelmachinelearningpredictionofproteinproteininteractionsinsaccharomycescerevisiae AT plewczynskidariusz multilevelmachinelearningpredictionofproteinproteininteractionsinsaccharomycescerevisiae

Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae

Ejemplares similares