Cargando…

PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method

MOTIVATION: Over the past decades, a variety of in silico methods have been developed to predict protein subcellular localization within cells. However, a common and major challenge in the design and development of such methods is how to effectively utilize the heterogeneous feature sets extracted f...

Descripción completa

Detalles Bibliográficos
Autores principales: Ullah, Matee, Hadi, Fazal, Song, Jiangning, Yu, Dong-Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9947927/
https://www.ncbi.nlm.nih.gov/pubmed/36413068
http://dx.doi.org/10.1093/bioinformatics/btac727
_version_ 1784892667548663808
author Ullah, Matee
Hadi, Fazal
Song, Jiangning
Yu, Dong-Jun
author_facet Ullah, Matee
Hadi, Fazal
Song, Jiangning
Yu, Dong-Jun
author_sort Ullah, Matee
collection PubMed
description MOTIVATION: Over the past decades, a variety of in silico methods have been developed to predict protein subcellular localization within cells. However, a common and major challenge in the design and development of such methods is how to effectively utilize the heterogeneous feature sets extracted from bioimages. In this regards, limited efforts have been undertaken. RESULTS: We propose a new two-level stacked autoencoder network (termed 2L-SAE-SM) to improve its performance by integrating the heterogeneous feature sets. In particular, in the first level of 2L-SAE-SM, each optimal heterogeneous feature set is fed to train our designed stacked autoencoder network (SAE-SM). All the trained SAE-SMs in the first level can output the decision sets based on their respective optimal heterogeneous feature sets, known as ‘intermediate decision’ sets. Such intermediate decision sets are then ensembled using the mean ensemble method to generate the ‘intermediate feature’ set for the second-level SAE-SM. Using the proposed framework, we further develop a novel predictor, referred to as PScL-2LSAESM, to characterize image-based protein subcellular localization. Extensive benchmarking experiments on the latest benchmark training and independent test datasets collected from the human protein atlas databank demonstrate the effectiveness of the proposed 2L-SAE-SM framework for the integration of heterogeneous feature sets. Moreover, performance comparison of the proposed PScL-2LSAESM with current state-of-the-art methods further illustrates that PScL-2LSAESM clearly outperforms the existing state-of-the-art methods for the task of protein subcellular localization. AVAILABILITY AND IMPLEMENTATION: https://github.com/csbio-njust-edu/PScL-2LSAESM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9947927
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-99479272023-02-24 PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method Ullah, Matee Hadi, Fazal Song, Jiangning Yu, Dong-Jun Bioinformatics Original Paper MOTIVATION: Over the past decades, a variety of in silico methods have been developed to predict protein subcellular localization within cells. However, a common and major challenge in the design and development of such methods is how to effectively utilize the heterogeneous feature sets extracted from bioimages. In this regards, limited efforts have been undertaken. RESULTS: We propose a new two-level stacked autoencoder network (termed 2L-SAE-SM) to improve its performance by integrating the heterogeneous feature sets. In particular, in the first level of 2L-SAE-SM, each optimal heterogeneous feature set is fed to train our designed stacked autoencoder network (SAE-SM). All the trained SAE-SMs in the first level can output the decision sets based on their respective optimal heterogeneous feature sets, known as ‘intermediate decision’ sets. Such intermediate decision sets are then ensembled using the mean ensemble method to generate the ‘intermediate feature’ set for the second-level SAE-SM. Using the proposed framework, we further develop a novel predictor, referred to as PScL-2LSAESM, to characterize image-based protein subcellular localization. Extensive benchmarking experiments on the latest benchmark training and independent test datasets collected from the human protein atlas databank demonstrate the effectiveness of the proposed 2L-SAE-SM framework for the integration of heterogeneous feature sets. Moreover, performance comparison of the proposed PScL-2LSAESM with current state-of-the-art methods further illustrates that PScL-2LSAESM clearly outperforms the existing state-of-the-art methods for the task of protein subcellular localization. AVAILABILITY AND IMPLEMENTATION: https://github.com/csbio-njust-edu/PScL-2LSAESM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-11-22 /pmc/articles/PMC9947927/ /pubmed/36413068 http://dx.doi.org/10.1093/bioinformatics/btac727 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Ullah, Matee
Hadi, Fazal
Song, Jiangning
Yu, Dong-Jun
PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method
title PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method
title_full PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method
title_fullStr PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method
title_full_unstemmed PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method
title_short PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method
title_sort pscl-2lsaesm: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level sae-sm and mean ensemble method
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9947927/
https://www.ncbi.nlm.nih.gov/pubmed/36413068
http://dx.doi.org/10.1093/bioinformatics/btac727
work_keys_str_mv AT ullahmatee pscl2lsaesmbioimagebasedpredictionofproteinsubcellularlocalizationbyintegratingheterogeneousfeatureswiththetwolevelsaesmandmeanensemblemethod
AT hadifazal pscl2lsaesmbioimagebasedpredictionofproteinsubcellularlocalizationbyintegratingheterogeneousfeatureswiththetwolevelsaesmandmeanensemblemethod
AT songjiangning pscl2lsaesmbioimagebasedpredictionofproteinsubcellularlocalizationbyintegratingheterogeneousfeatureswiththetwolevelsaesmandmeanensemblemethod
AT yudongjun pscl2lsaesmbioimagebasedpredictionofproteinsubcellularlocalizationbyintegratingheterogeneousfeatureswiththetwolevelsaesmandmeanensemblemethod