Cargando…

Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences

Machine learning based predictions of protein–protein interactions (PPIs) could provide valuable insights into protein functions, disease occurrence, and therapy design on a large scale. The intensive feature engineering in most of these methods makes the prediction task more tedious and trivial. Th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Hang, Gong, Xiu-Jun, Yu, Hua, Zhou, Chang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2018
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6222503/ https://www.ncbi.nlm.nih.gov/pubmed/30071670 http://dx.doi.org/10.3390/molecules23081923

_version_	1783369220435738624
author	Li, Hang Gong, Xiu-Jun Yu, Hua Zhou, Chang
author_facet	Li, Hang Gong, Xiu-Jun Yu, Hua Zhou, Chang
author_sort	Li, Hang
collection	PubMed
description	Machine learning based predictions of protein–protein interactions (PPIs) could provide valuable insights into protein functions, disease occurrence, and therapy design on a large scale. The intensive feature engineering in most of these methods makes the prediction task more tedious and trivial. The emerging deep learning technology enabling automatic feature engineering is gaining great success in various fields. However, the over-fitting and generalization of its models are not yet well investigated in most scenarios. Here, we present a deep neural network framework (DNN-PPI) for predicting PPIs using features learned automatically only from protein primary sequences. Within the framework, the sequences of two interacting proteins are sequentially fed into the encoding, embedding, convolution neural network (CNN), and long short-term memory (LSTM) neural network layers. Then, a concatenated vector of the two outputs from the previous layer is wired as the input of the fully connected neural network. Finally, the Adam optimizer is applied to learn the network weights in a back-propagation fashion. The different types of features, including semantic associations between amino acids, position-related sequence segments (motif), and their long- and short-term dependencies, are captured in the embedding, CNN and LSTM layers, respectively. When the model was trained on Pan’s human PPI dataset, it achieved a prediction accuracy of 98.78% at the Matthew’s correlation coefficient (MCC) of 97.57%. The prediction accuracies for six external datasets ranged from 92.80% to 97.89%, making them superior to those achieved with previous methods. When performed on Escherichia coli, Drosophila, and Caenorhabditis elegans datasets, DNN-PPI obtained prediction accuracies of 95.949%, 98.389%, and 98.669%, respectively. The performances in cross-species testing among the four species above coincided in their evolutionary distances. However, when testing Mus Musculus using the models from those species, they all obtained prediction accuracies of over 92.43%, which is difficult to achieve and worthy of note for further study. These results suggest that DNN-PPI has remarkable generalization and is a promising tool for identifying protein interactions.
format	Online Article Text
id	pubmed-6222503
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-62225032018-11-13 Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences Li, Hang Gong, Xiu-Jun Yu, Hua Zhou, Chang Molecules Article Machine learning based predictions of protein–protein interactions (PPIs) could provide valuable insights into protein functions, disease occurrence, and therapy design on a large scale. The intensive feature engineering in most of these methods makes the prediction task more tedious and trivial. The emerging deep learning technology enabling automatic feature engineering is gaining great success in various fields. However, the over-fitting and generalization of its models are not yet well investigated in most scenarios. Here, we present a deep neural network framework (DNN-PPI) for predicting PPIs using features learned automatically only from protein primary sequences. Within the framework, the sequences of two interacting proteins are sequentially fed into the encoding, embedding, convolution neural network (CNN), and long short-term memory (LSTM) neural network layers. Then, a concatenated vector of the two outputs from the previous layer is wired as the input of the fully connected neural network. Finally, the Adam optimizer is applied to learn the network weights in a back-propagation fashion. The different types of features, including semantic associations between amino acids, position-related sequence segments (motif), and their long- and short-term dependencies, are captured in the embedding, CNN and LSTM layers, respectively. When the model was trained on Pan’s human PPI dataset, it achieved a prediction accuracy of 98.78% at the Matthew’s correlation coefficient (MCC) of 97.57%. The prediction accuracies for six external datasets ranged from 92.80% to 97.89%, making them superior to those achieved with previous methods. When performed on Escherichia coli, Drosophila, and Caenorhabditis elegans datasets, DNN-PPI obtained prediction accuracies of 95.949%, 98.389%, and 98.669%, respectively. The performances in cross-species testing among the four species above coincided in their evolutionary distances. However, when testing Mus Musculus using the models from those species, they all obtained prediction accuracies of over 92.43%, which is difficult to achieve and worthy of note for further study. These results suggest that DNN-PPI has remarkable generalization and is a promising tool for identifying protein interactions. MDPI 2018-08-01 /pmc/articles/PMC6222503/ /pubmed/30071670 http://dx.doi.org/10.3390/molecules23081923 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Li, Hang Gong, Xiu-Jun Yu, Hua Zhou, Chang Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences
title	Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences
title_full	Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences
title_fullStr	Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences
title_full_unstemmed	Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences
title_short	Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences
title_sort	deep neural network based predictions of protein interactions using primary sequences
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6222503/ https://www.ncbi.nlm.nih.gov/pubmed/30071670 http://dx.doi.org/10.3390/molecules23081923
work_keys_str_mv	AT lihang deepneuralnetworkbasedpredictionsofproteininteractionsusingprimarysequences AT gongxiujun deepneuralnetworkbasedpredictionsofproteininteractionsusingprimarysequences AT yuhua deepneuralnetworkbasedpredictionsofproteininteractionsusingprimarysequences AT zhouchang deepneuralnetworkbasedpredictionsofproteininteractionsusingprimarysequences

Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences

Ejemplares similares