Cargando…

Identification of all-against-all protein–protein interactions based on deep hash learning

BACKGROUND: Protein–protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protei...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Yue, Wang, Yuxuan, Shen, Lin, Adjeroh, Donald A., Liu, Zhidong, Lin, Jie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264577/
https://www.ncbi.nlm.nih.gov/pubmed/35804303
http://dx.doi.org/10.1186/s12859-022-04811-x
_version_ 1784742993501093888
author Jiang, Yue
Wang, Yuxuan
Shen, Lin
Adjeroh, Donald A.
Liu, Zhidong
Lin, Jie
author_facet Jiang, Yue
Wang, Yuxuan
Shen, Lin
Adjeroh, Donald A.
Liu, Zhidong
Lin, Jie
author_sort Jiang, Yue
collection PubMed
description BACKGROUND: Protein–protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protein has any PPI relationship with other existing proteins. Current computational PPI prediction methods usually compare the new protein to existing proteins one by one in a pairwise manner. This is time consuming. RESULTS: In this work, we propose a more efficient model, called deep hash learning protein-and-protein interaction (DHL-PPI), to predict all-against-all PPI relationships in a database of proteins. First, DHL-PPI encodes a protein sequence into a binary hash code based on deep features extracted from the protein sequences using deep learning techniques. This encoding scheme enables us to turn the PPI discrimination problem into a much simpler searching problem. The binary hash code for a protein sequence can be regarded as a number. Thus, in the pre-screening stage of DHL-PPI, the string matching problem of comparing a protein sequence against a database with M proteins can be transformed into a much more simpler problem: to find a number inside a sorted array of length M. This pre-screening process narrows down the search to a much smaller set of candidate proteins for further confirmation. As a final step, DHL-PPI uses the Hamming distance to verify the final PPI relationship. CONCLUSIONS: The experimental results confirmed that DHL-PPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHL-PPI is shown to be superior or competitive when compared to the other state-of-the-art methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHL-PPI reduced the time complexity from [Formula: see text] to [Formula: see text] for performing an all-against-all PPI prediction for a database with M proteins. With the proposed approach, a protein database can be preprocessed and stored for later search using the proposed encoding scheme. This can provide a more efficient way to cope with the rapidly increasing volume of protein datasets.
format Online
Article
Text
id pubmed-9264577
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-92645772022-07-09 Identification of all-against-all protein–protein interactions based on deep hash learning Jiang, Yue Wang, Yuxuan Shen, Lin Adjeroh, Donald A. Liu, Zhidong Lin, Jie BMC Bioinformatics Research BACKGROUND: Protein–protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protein has any PPI relationship with other existing proteins. Current computational PPI prediction methods usually compare the new protein to existing proteins one by one in a pairwise manner. This is time consuming. RESULTS: In this work, we propose a more efficient model, called deep hash learning protein-and-protein interaction (DHL-PPI), to predict all-against-all PPI relationships in a database of proteins. First, DHL-PPI encodes a protein sequence into a binary hash code based on deep features extracted from the protein sequences using deep learning techniques. This encoding scheme enables us to turn the PPI discrimination problem into a much simpler searching problem. The binary hash code for a protein sequence can be regarded as a number. Thus, in the pre-screening stage of DHL-PPI, the string matching problem of comparing a protein sequence against a database with M proteins can be transformed into a much more simpler problem: to find a number inside a sorted array of length M. This pre-screening process narrows down the search to a much smaller set of candidate proteins for further confirmation. As a final step, DHL-PPI uses the Hamming distance to verify the final PPI relationship. CONCLUSIONS: The experimental results confirmed that DHL-PPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHL-PPI is shown to be superior or competitive when compared to the other state-of-the-art methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHL-PPI reduced the time complexity from [Formula: see text] to [Formula: see text] for performing an all-against-all PPI prediction for a database with M proteins. With the proposed approach, a protein database can be preprocessed and stored for later search using the proposed encoding scheme. This can provide a more efficient way to cope with the rapidly increasing volume of protein datasets. BioMed Central 2022-07-08 /pmc/articles/PMC9264577/ /pubmed/35804303 http://dx.doi.org/10.1186/s12859-022-04811-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Jiang, Yue
Wang, Yuxuan
Shen, Lin
Adjeroh, Donald A.
Liu, Zhidong
Lin, Jie
Identification of all-against-all protein–protein interactions based on deep hash learning
title Identification of all-against-all protein–protein interactions based on deep hash learning
title_full Identification of all-against-all protein–protein interactions based on deep hash learning
title_fullStr Identification of all-against-all protein–protein interactions based on deep hash learning
title_full_unstemmed Identification of all-against-all protein–protein interactions based on deep hash learning
title_short Identification of all-against-all protein–protein interactions based on deep hash learning
title_sort identification of all-against-all protein–protein interactions based on deep hash learning
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264577/
https://www.ncbi.nlm.nih.gov/pubmed/35804303
http://dx.doi.org/10.1186/s12859-022-04811-x
work_keys_str_mv AT jiangyue identificationofallagainstallproteinproteininteractionsbasedondeephashlearning
AT wangyuxuan identificationofallagainstallproteinproteininteractionsbasedondeephashlearning
AT shenlin identificationofallagainstallproteinproteininteractionsbasedondeephashlearning
AT adjerohdonalda identificationofallagainstallproteinproteininteractionsbasedondeephashlearning
AT liuzhidong identificationofallagainstallproteinproteininteractionsbasedondeephashlearning
AT linjie identificationofallagainstallproteinproteininteractionsbasedondeephashlearning