Cargando…
Identification of all-against-all protein–protein interactions based on deep hash learning
BACKGROUND: Protein–protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protei...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264577/ https://www.ncbi.nlm.nih.gov/pubmed/35804303 http://dx.doi.org/10.1186/s12859-022-04811-x |
_version_ | 1784742993501093888 |
---|---|
author | Jiang, Yue Wang, Yuxuan Shen, Lin Adjeroh, Donald A. Liu, Zhidong Lin, Jie |
author_facet | Jiang, Yue Wang, Yuxuan Shen, Lin Adjeroh, Donald A. Liu, Zhidong Lin, Jie |
author_sort | Jiang, Yue |
collection | PubMed |
description | BACKGROUND: Protein–protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protein has any PPI relationship with other existing proteins. Current computational PPI prediction methods usually compare the new protein to existing proteins one by one in a pairwise manner. This is time consuming. RESULTS: In this work, we propose a more efficient model, called deep hash learning protein-and-protein interaction (DHL-PPI), to predict all-against-all PPI relationships in a database of proteins. First, DHL-PPI encodes a protein sequence into a binary hash code based on deep features extracted from the protein sequences using deep learning techniques. This encoding scheme enables us to turn the PPI discrimination problem into a much simpler searching problem. The binary hash code for a protein sequence can be regarded as a number. Thus, in the pre-screening stage of DHL-PPI, the string matching problem of comparing a protein sequence against a database with M proteins can be transformed into a much more simpler problem: to find a number inside a sorted array of length M. This pre-screening process narrows down the search to a much smaller set of candidate proteins for further confirmation. As a final step, DHL-PPI uses the Hamming distance to verify the final PPI relationship. CONCLUSIONS: The experimental results confirmed that DHL-PPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHL-PPI is shown to be superior or competitive when compared to the other state-of-the-art methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHL-PPI reduced the time complexity from [Formula: see text] to [Formula: see text] for performing an all-against-all PPI prediction for a database with M proteins. With the proposed approach, a protein database can be preprocessed and stored for later search using the proposed encoding scheme. This can provide a more efficient way to cope with the rapidly increasing volume of protein datasets. |
format | Online Article Text |
id | pubmed-9264577 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-92645772022-07-09 Identification of all-against-all protein–protein interactions based on deep hash learning Jiang, Yue Wang, Yuxuan Shen, Lin Adjeroh, Donald A. Liu, Zhidong Lin, Jie BMC Bioinformatics Research BACKGROUND: Protein–protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protein has any PPI relationship with other existing proteins. Current computational PPI prediction methods usually compare the new protein to existing proteins one by one in a pairwise manner. This is time consuming. RESULTS: In this work, we propose a more efficient model, called deep hash learning protein-and-protein interaction (DHL-PPI), to predict all-against-all PPI relationships in a database of proteins. First, DHL-PPI encodes a protein sequence into a binary hash code based on deep features extracted from the protein sequences using deep learning techniques. This encoding scheme enables us to turn the PPI discrimination problem into a much simpler searching problem. The binary hash code for a protein sequence can be regarded as a number. Thus, in the pre-screening stage of DHL-PPI, the string matching problem of comparing a protein sequence against a database with M proteins can be transformed into a much more simpler problem: to find a number inside a sorted array of length M. This pre-screening process narrows down the search to a much smaller set of candidate proteins for further confirmation. As a final step, DHL-PPI uses the Hamming distance to verify the final PPI relationship. CONCLUSIONS: The experimental results confirmed that DHL-PPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHL-PPI is shown to be superior or competitive when compared to the other state-of-the-art methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHL-PPI reduced the time complexity from [Formula: see text] to [Formula: see text] for performing an all-against-all PPI prediction for a database with M proteins. With the proposed approach, a protein database can be preprocessed and stored for later search using the proposed encoding scheme. This can provide a more efficient way to cope with the rapidly increasing volume of protein datasets. BioMed Central 2022-07-08 /pmc/articles/PMC9264577/ /pubmed/35804303 http://dx.doi.org/10.1186/s12859-022-04811-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Jiang, Yue Wang, Yuxuan Shen, Lin Adjeroh, Donald A. Liu, Zhidong Lin, Jie Identification of all-against-all protein–protein interactions based on deep hash learning |
title | Identification of all-against-all protein–protein interactions based on deep hash learning |
title_full | Identification of all-against-all protein–protein interactions based on deep hash learning |
title_fullStr | Identification of all-against-all protein–protein interactions based on deep hash learning |
title_full_unstemmed | Identification of all-against-all protein–protein interactions based on deep hash learning |
title_short | Identification of all-against-all protein–protein interactions based on deep hash learning |
title_sort | identification of all-against-all protein–protein interactions based on deep hash learning |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264577/ https://www.ncbi.nlm.nih.gov/pubmed/35804303 http://dx.doi.org/10.1186/s12859-022-04811-x |
work_keys_str_mv | AT jiangyue identificationofallagainstallproteinproteininteractionsbasedondeephashlearning AT wangyuxuan identificationofallagainstallproteinproteininteractionsbasedondeephashlearning AT shenlin identificationofallagainstallproteinproteininteractionsbasedondeephashlearning AT adjerohdonalda identificationofallagainstallproteinproteininteractionsbasedondeephashlearning AT liuzhidong identificationofallagainstallproteinproteininteractionsbasedondeephashlearning AT linjie identificationofallagainstallproteinproteininteractionsbasedondeephashlearning |