Cargando…

A deep learning framework for identifying essential proteins based on multiple biological information

BACKGROUND: Essential Proteins are demonstrated to exert vital functions on cellular processes and are indispensable for the survival and reproduction of the organism. Traditional centrality methods perform poorly on complex protein–protein interaction (PPI) networks. Machine learning approaches bas...

Descripción completa

Detalles Bibliográficos
Autores principales: Yue, Yi, Ye, Chen, Peng, Pei-Yun, Zhai, Hui-Xin, Ahmad, Iftikhar, Xia, Chuan, Wu, Yun-Zhi, Zhang, You-Hua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9351218/
https://www.ncbi.nlm.nih.gov/pubmed/35927611
http://dx.doi.org/10.1186/s12859-022-04868-8
_version_ 1784762395848081408
author Yue, Yi
Ye, Chen
Peng, Pei-Yun
Zhai, Hui-Xin
Ahmad, Iftikhar
Xia, Chuan
Wu, Yun-Zhi
Zhang, You-Hua
author_facet Yue, Yi
Ye, Chen
Peng, Pei-Yun
Zhai, Hui-Xin
Ahmad, Iftikhar
Xia, Chuan
Wu, Yun-Zhi
Zhang, You-Hua
author_sort Yue, Yi
collection PubMed
description BACKGROUND: Essential Proteins are demonstrated to exert vital functions on cellular processes and are indispensable for the survival and reproduction of the organism. Traditional centrality methods perform poorly on complex protein–protein interaction (PPI) networks. Machine learning approaches based on high-throughput data lack the exploitation of the temporal and spatial dimensions of biological information. RESULTS: We put forward a deep learning framework to predict essential proteins by integrating features obtained from the PPI network, subcellular localization, and gene expression profiles. In our model, the node2vec method is applied to learn continuous feature representations for proteins in the PPI network, which capture the diversity of connectivity patterns in the network. The concept of depthwise separable convolution is employed on gene expression profiles to extract properties and observe the trends of gene expression over time under different experimental conditions. Subcellular localization information is mapped into a long one-dimensional vector to capture its characteristics. Additionally, we use a sampling method to mitigate the impact of imbalanced learning when training the model. With experiments carried out on the data of Saccharomyces cerevisiae, results show that our model outperforms traditional centrality methods and machine learning methods. Likewise, the comparative experiments have manifested that our process of various biological information is preferable. CONCLUSIONS: Our proposed deep learning framework effectively identifies essential proteins by integrating multiple biological data, proving a broader selection of subcellular localization information significantly improves the results of prediction and depthwise separable convolution implemented on gene expression profiles enhances the performance.
format Online
Article
Text
id pubmed-9351218
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-93512182022-08-05 A deep learning framework for identifying essential proteins based on multiple biological information Yue, Yi Ye, Chen Peng, Pei-Yun Zhai, Hui-Xin Ahmad, Iftikhar Xia, Chuan Wu, Yun-Zhi Zhang, You-Hua BMC Bioinformatics Research BACKGROUND: Essential Proteins are demonstrated to exert vital functions on cellular processes and are indispensable for the survival and reproduction of the organism. Traditional centrality methods perform poorly on complex protein–protein interaction (PPI) networks. Machine learning approaches based on high-throughput data lack the exploitation of the temporal and spatial dimensions of biological information. RESULTS: We put forward a deep learning framework to predict essential proteins by integrating features obtained from the PPI network, subcellular localization, and gene expression profiles. In our model, the node2vec method is applied to learn continuous feature representations for proteins in the PPI network, which capture the diversity of connectivity patterns in the network. The concept of depthwise separable convolution is employed on gene expression profiles to extract properties and observe the trends of gene expression over time under different experimental conditions. Subcellular localization information is mapped into a long one-dimensional vector to capture its characteristics. Additionally, we use a sampling method to mitigate the impact of imbalanced learning when training the model. With experiments carried out on the data of Saccharomyces cerevisiae, results show that our model outperforms traditional centrality methods and machine learning methods. Likewise, the comparative experiments have manifested that our process of various biological information is preferable. CONCLUSIONS: Our proposed deep learning framework effectively identifies essential proteins by integrating multiple biological data, proving a broader selection of subcellular localization information significantly improves the results of prediction and depthwise separable convolution implemented on gene expression profiles enhances the performance. BioMed Central 2022-08-04 /pmc/articles/PMC9351218/ /pubmed/35927611 http://dx.doi.org/10.1186/s12859-022-04868-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Yue, Yi
Ye, Chen
Peng, Pei-Yun
Zhai, Hui-Xin
Ahmad, Iftikhar
Xia, Chuan
Wu, Yun-Zhi
Zhang, You-Hua
A deep learning framework for identifying essential proteins based on multiple biological information
title A deep learning framework for identifying essential proteins based on multiple biological information
title_full A deep learning framework for identifying essential proteins based on multiple biological information
title_fullStr A deep learning framework for identifying essential proteins based on multiple biological information
title_full_unstemmed A deep learning framework for identifying essential proteins based on multiple biological information
title_short A deep learning framework for identifying essential proteins based on multiple biological information
title_sort deep learning framework for identifying essential proteins based on multiple biological information
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9351218/
https://www.ncbi.nlm.nih.gov/pubmed/35927611
http://dx.doi.org/10.1186/s12859-022-04868-8
work_keys_str_mv AT yueyi adeeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT yechen adeeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT pengpeiyun adeeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT zhaihuixin adeeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT ahmadiftikhar adeeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT xiachuan adeeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT wuyunzhi adeeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT zhangyouhua adeeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT yueyi deeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT yechen deeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT pengpeiyun deeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT zhaihuixin deeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT ahmadiftikhar deeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT xiachuan deeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT wuyunzhi deeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation
AT zhangyouhua deeplearningframeworkforidentifyingessentialproteinsbasedonmultiplebiologicalinformation