Cargando…

A High Efficient Biological Language Model for Predicting Protein–Protein Interactions

Many life activities and key functions in organisms are maintained by different types of protein–protein interactions (PPIs). In order to accelerate the discovery of PPIs for different species, many computational methods have been developed. Unfortunately, even though computational methods are const...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Yanbin, You, Zhu-Hong, Yang, Shan, Li, Xiao, Jiang, Tong-Hai, Zhou, Xi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6406841/ https://www.ncbi.nlm.nih.gov/pubmed/30717470 http://dx.doi.org/10.3390/cells8020122

_version_	1783401416862203904
author	Wang, Yanbin You, Zhu-Hong Yang, Shan Li, Xiao Jiang, Tong-Hai Zhou, Xi
author_facet	Wang, Yanbin You, Zhu-Hong Yang, Shan Li, Xiao Jiang, Tong-Hai Zhou, Xi
author_sort	Wang, Yanbin
collection	PubMed
description	Many life activities and key functions in organisms are maintained by different types of protein–protein interactions (PPIs). In order to accelerate the discovery of PPIs for different species, many computational methods have been developed. Unfortunately, even though computational methods are constantly evolving, efficient methods for predicting PPIs from protein sequence information have not been found for many years due to limiting factors including both methodology and technology. Inspired by the similarity of biological sequences and languages, developing a biological language processing technology may provide a brand new theoretical perspective and feasible method for the study of biological sequences. In this paper, a pure biological language processing model is proposed for predicting protein–protein interactions only using a protein sequence. The model was constructed based on a feature representation method for biological sequences called bio-to-vector (Bio2Vec) and a convolution neural network (CNN). The Bio2Vec obtains protein sequence features by using a “bio-word” segmentation system and a word representation model used for learning the distributed representation for each “bio-word”. The Bio2Vec supplies a frame that allows researchers to consider the context information and implicit semantic information of a bio sequence. A remarkable improvement in PPIs prediction performance has been observed by using the proposed model compared with state-of-the-art methods. The presentation of this approach marks the start of “bio language processing technology,” which could cause a technological revolution and could be applied to improve the quality of predictions in other problems.
format	Online Article Text
id	pubmed-6406841
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-64068412019-03-19 A High Efficient Biological Language Model for Predicting Protein–Protein Interactions Wang, Yanbin You, Zhu-Hong Yang, Shan Li, Xiao Jiang, Tong-Hai Zhou, Xi Cells Article Many life activities and key functions in organisms are maintained by different types of protein–protein interactions (PPIs). In order to accelerate the discovery of PPIs for different species, many computational methods have been developed. Unfortunately, even though computational methods are constantly evolving, efficient methods for predicting PPIs from protein sequence information have not been found for many years due to limiting factors including both methodology and technology. Inspired by the similarity of biological sequences and languages, developing a biological language processing technology may provide a brand new theoretical perspective and feasible method for the study of biological sequences. In this paper, a pure biological language processing model is proposed for predicting protein–protein interactions only using a protein sequence. The model was constructed based on a feature representation method for biological sequences called bio-to-vector (Bio2Vec) and a convolution neural network (CNN). The Bio2Vec obtains protein sequence features by using a “bio-word” segmentation system and a word representation model used for learning the distributed representation for each “bio-word”. The Bio2Vec supplies a frame that allows researchers to consider the context information and implicit semantic information of a bio sequence. A remarkable improvement in PPIs prediction performance has been observed by using the proposed model compared with state-of-the-art methods. The presentation of this approach marks the start of “bio language processing technology,” which could cause a technological revolution and could be applied to improve the quality of predictions in other problems. MDPI 2019-02-03 /pmc/articles/PMC6406841/ /pubmed/30717470 http://dx.doi.org/10.3390/cells8020122 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Wang, Yanbin You, Zhu-Hong Yang, Shan Li, Xiao Jiang, Tong-Hai Zhou, Xi A High Efficient Biological Language Model for Predicting Protein–Protein Interactions
title	A High Efficient Biological Language Model for Predicting Protein–Protein Interactions
title_full	A High Efficient Biological Language Model for Predicting Protein–Protein Interactions
title_fullStr	A High Efficient Biological Language Model for Predicting Protein–Protein Interactions
title_full_unstemmed	A High Efficient Biological Language Model for Predicting Protein–Protein Interactions
title_short	A High Efficient Biological Language Model for Predicting Protein–Protein Interactions
title_sort	high efficient biological language model for predicting protein–protein interactions
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6406841/ https://www.ncbi.nlm.nih.gov/pubmed/30717470 http://dx.doi.org/10.3390/cells8020122
work_keys_str_mv	AT wangyanbin ahighefficientbiologicallanguagemodelforpredictingproteinproteininteractions AT youzhuhong ahighefficientbiologicallanguagemodelforpredictingproteinproteininteractions AT yangshan ahighefficientbiologicallanguagemodelforpredictingproteinproteininteractions AT lixiao ahighefficientbiologicallanguagemodelforpredictingproteinproteininteractions AT jiangtonghai ahighefficientbiologicallanguagemodelforpredictingproteinproteininteractions AT zhouxi ahighefficientbiologicallanguagemodelforpredictingproteinproteininteractions AT wangyanbin highefficientbiologicallanguagemodelforpredictingproteinproteininteractions AT youzhuhong highefficientbiologicallanguagemodelforpredictingproteinproteininteractions AT yangshan highefficientbiologicallanguagemodelforpredictingproteinproteininteractions AT lixiao highefficientbiologicallanguagemodelforpredictingproteinproteininteractions AT jiangtonghai highefficientbiologicallanguagemodelforpredictingproteinproteininteractions AT zhouxi highefficientbiologicallanguagemodelforpredictingproteinproteininteractions

A High Efficient Biological Language Model for Predicting Protein–Protein Interactions

Ejemplares similares