Cargando…

ASPIRER: a new computational approach for identifying non-classical secreted proteins based on deep learning

Protein secretion has a pivotal role in many biological processes and is particularly important for intercellular communication, from the cytoplasm to the host or external environment. Gram-positive bacteria can secrete proteins through multiple secretion pathways. The non-classical secretion pathwa...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Xiaoyu, Li, Fuyi, Xu, Jing, Rong, Jia, Webb, Geoffrey I, Ge, Zongyuan, Li, Jian, Song, Jiangning
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8921646/
https://www.ncbi.nlm.nih.gov/pubmed/35176756
http://dx.doi.org/10.1093/bib/bbac031
_version_ 1784669364901904384
author Wang, Xiaoyu
Li, Fuyi
Xu, Jing
Rong, Jia
Webb, Geoffrey I
Ge, Zongyuan
Li, Jian
Song, Jiangning
author_facet Wang, Xiaoyu
Li, Fuyi
Xu, Jing
Rong, Jia
Webb, Geoffrey I
Ge, Zongyuan
Li, Jian
Song, Jiangning
author_sort Wang, Xiaoyu
collection PubMed
description Protein secretion has a pivotal role in many biological processes and is particularly important for intercellular communication, from the cytoplasm to the host or external environment. Gram-positive bacteria can secrete proteins through multiple secretion pathways. The non-classical secretion pathway has recently received increasing attention among these secretion pathways, but its exact mechanism remains unclear. Non-classical secreted proteins (NCSPs) are a class of secreted proteins lacking signal peptides and motifs. Several NCSP predictors have been proposed to identify NCSPs and most of them employed the whole amino acid sequence of NCSPs to construct the model. However, the sequence length of different proteins varies greatly. In addition, not all regions of the protein are equally important and some local regions are not relevant to the secretion. The functional regions of the protein, particularly in the N- and C-terminal regions, contain important determinants for secretion. In this study, we propose a new hybrid deep learning-based framework, referred to as ASPIRER, which improves the prediction of NCSPs from amino acid sequences. More specifically, it combines a whole sequence-based XGBoost model and an N-terminal sequence-based convolutional neural network model; 5-fold cross-validation and independent tests demonstrate that ASPIRER achieves superior performance than existing state-of-the-art approaches. The source code and curated datasets of ASPIRER are publicly available at https://github.com/yanwu20/ASPIRER/. ASPIRER is anticipated to be a useful tool for improved prediction of novel putative NCSPs from sequences information and prioritization of candidate proteins for follow-up experimental validation.
format Online
Article
Text
id pubmed-8921646
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-89216462022-05-31 ASPIRER: a new computational approach for identifying non-classical secreted proteins based on deep learning Wang, Xiaoyu Li, Fuyi Xu, Jing Rong, Jia Webb, Geoffrey I Ge, Zongyuan Li, Jian Song, Jiangning Brief Bioinform Problem Solving Protocol Protein secretion has a pivotal role in many biological processes and is particularly important for intercellular communication, from the cytoplasm to the host or external environment. Gram-positive bacteria can secrete proteins through multiple secretion pathways. The non-classical secretion pathway has recently received increasing attention among these secretion pathways, but its exact mechanism remains unclear. Non-classical secreted proteins (NCSPs) are a class of secreted proteins lacking signal peptides and motifs. Several NCSP predictors have been proposed to identify NCSPs and most of them employed the whole amino acid sequence of NCSPs to construct the model. However, the sequence length of different proteins varies greatly. In addition, not all regions of the protein are equally important and some local regions are not relevant to the secretion. The functional regions of the protein, particularly in the N- and C-terminal regions, contain important determinants for secretion. In this study, we propose a new hybrid deep learning-based framework, referred to as ASPIRER, which improves the prediction of NCSPs from amino acid sequences. More specifically, it combines a whole sequence-based XGBoost model and an N-terminal sequence-based convolutional neural network model; 5-fold cross-validation and independent tests demonstrate that ASPIRER achieves superior performance than existing state-of-the-art approaches. The source code and curated datasets of ASPIRER are publicly available at https://github.com/yanwu20/ASPIRER/. ASPIRER is anticipated to be a useful tool for improved prediction of novel putative NCSPs from sequences information and prioritization of candidate proteins for follow-up experimental validation. Oxford University Press 2022-02-17 /pmc/articles/PMC8921646/ /pubmed/35176756 http://dx.doi.org/10.1093/bib/bbac031 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Wang, Xiaoyu
Li, Fuyi
Xu, Jing
Rong, Jia
Webb, Geoffrey I
Ge, Zongyuan
Li, Jian
Song, Jiangning
ASPIRER: a new computational approach for identifying non-classical secreted proteins based on deep learning
title ASPIRER: a new computational approach for identifying non-classical secreted proteins based on deep learning
title_full ASPIRER: a new computational approach for identifying non-classical secreted proteins based on deep learning
title_fullStr ASPIRER: a new computational approach for identifying non-classical secreted proteins based on deep learning
title_full_unstemmed ASPIRER: a new computational approach for identifying non-classical secreted proteins based on deep learning
title_short ASPIRER: a new computational approach for identifying non-classical secreted proteins based on deep learning
title_sort aspirer: a new computational approach for identifying non-classical secreted proteins based on deep learning
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8921646/
https://www.ncbi.nlm.nih.gov/pubmed/35176756
http://dx.doi.org/10.1093/bib/bbac031
work_keys_str_mv AT wangxiaoyu aspireranewcomputationalapproachforidentifyingnonclassicalsecretedproteinsbasedondeeplearning
AT lifuyi aspireranewcomputationalapproachforidentifyingnonclassicalsecretedproteinsbasedondeeplearning
AT xujing aspireranewcomputationalapproachforidentifyingnonclassicalsecretedproteinsbasedondeeplearning
AT rongjia aspireranewcomputationalapproachforidentifyingnonclassicalsecretedproteinsbasedondeeplearning
AT webbgeoffreyi aspireranewcomputationalapproachforidentifyingnonclassicalsecretedproteinsbasedondeeplearning
AT gezongyuan aspireranewcomputationalapproachforidentifyingnonclassicalsecretedproteinsbasedondeeplearning
AT lijian aspireranewcomputationalapproachforidentifyingnonclassicalsecretedproteinsbasedondeeplearning
AT songjiangning aspireranewcomputationalapproachforidentifyingnonclassicalsecretedproteinsbasedondeeplearning