Cargando…
PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method
MOTIVATION: Over the past decades, a variety of in silico methods have been developed to predict protein subcellular localization within cells. However, a common and major challenge in the design and development of such methods is how to effectively utilize the heterogeneous feature sets extracted f...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9947927/ https://www.ncbi.nlm.nih.gov/pubmed/36413068 http://dx.doi.org/10.1093/bioinformatics/btac727 |
_version_ | 1784892667548663808 |
---|---|
author | Ullah, Matee Hadi, Fazal Song, Jiangning Yu, Dong-Jun |
author_facet | Ullah, Matee Hadi, Fazal Song, Jiangning Yu, Dong-Jun |
author_sort | Ullah, Matee |
collection | PubMed |
description | MOTIVATION: Over the past decades, a variety of in silico methods have been developed to predict protein subcellular localization within cells. However, a common and major challenge in the design and development of such methods is how to effectively utilize the heterogeneous feature sets extracted from bioimages. In this regards, limited efforts have been undertaken. RESULTS: We propose a new two-level stacked autoencoder network (termed 2L-SAE-SM) to improve its performance by integrating the heterogeneous feature sets. In particular, in the first level of 2L-SAE-SM, each optimal heterogeneous feature set is fed to train our designed stacked autoencoder network (SAE-SM). All the trained SAE-SMs in the first level can output the decision sets based on their respective optimal heterogeneous feature sets, known as ‘intermediate decision’ sets. Such intermediate decision sets are then ensembled using the mean ensemble method to generate the ‘intermediate feature’ set for the second-level SAE-SM. Using the proposed framework, we further develop a novel predictor, referred to as PScL-2LSAESM, to characterize image-based protein subcellular localization. Extensive benchmarking experiments on the latest benchmark training and independent test datasets collected from the human protein atlas databank demonstrate the effectiveness of the proposed 2L-SAE-SM framework for the integration of heterogeneous feature sets. Moreover, performance comparison of the proposed PScL-2LSAESM with current state-of-the-art methods further illustrates that PScL-2LSAESM clearly outperforms the existing state-of-the-art methods for the task of protein subcellular localization. AVAILABILITY AND IMPLEMENTATION: https://github.com/csbio-njust-edu/PScL-2LSAESM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-9947927 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-99479272023-02-24 PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method Ullah, Matee Hadi, Fazal Song, Jiangning Yu, Dong-Jun Bioinformatics Original Paper MOTIVATION: Over the past decades, a variety of in silico methods have been developed to predict protein subcellular localization within cells. However, a common and major challenge in the design and development of such methods is how to effectively utilize the heterogeneous feature sets extracted from bioimages. In this regards, limited efforts have been undertaken. RESULTS: We propose a new two-level stacked autoencoder network (termed 2L-SAE-SM) to improve its performance by integrating the heterogeneous feature sets. In particular, in the first level of 2L-SAE-SM, each optimal heterogeneous feature set is fed to train our designed stacked autoencoder network (SAE-SM). All the trained SAE-SMs in the first level can output the decision sets based on their respective optimal heterogeneous feature sets, known as ‘intermediate decision’ sets. Such intermediate decision sets are then ensembled using the mean ensemble method to generate the ‘intermediate feature’ set for the second-level SAE-SM. Using the proposed framework, we further develop a novel predictor, referred to as PScL-2LSAESM, to characterize image-based protein subcellular localization. Extensive benchmarking experiments on the latest benchmark training and independent test datasets collected from the human protein atlas databank demonstrate the effectiveness of the proposed 2L-SAE-SM framework for the integration of heterogeneous feature sets. Moreover, performance comparison of the proposed PScL-2LSAESM with current state-of-the-art methods further illustrates that PScL-2LSAESM clearly outperforms the existing state-of-the-art methods for the task of protein subcellular localization. AVAILABILITY AND IMPLEMENTATION: https://github.com/csbio-njust-edu/PScL-2LSAESM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-11-22 /pmc/articles/PMC9947927/ /pubmed/36413068 http://dx.doi.org/10.1093/bioinformatics/btac727 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Ullah, Matee Hadi, Fazal Song, Jiangning Yu, Dong-Jun PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method |
title | PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method |
title_full | PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method |
title_fullStr | PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method |
title_full_unstemmed | PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method |
title_short | PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method |
title_sort | pscl-2lsaesm: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level sae-sm and mean ensemble method |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9947927/ https://www.ncbi.nlm.nih.gov/pubmed/36413068 http://dx.doi.org/10.1093/bioinformatics/btac727 |
work_keys_str_mv | AT ullahmatee pscl2lsaesmbioimagebasedpredictionofproteinsubcellularlocalizationbyintegratingheterogeneousfeatureswiththetwolevelsaesmandmeanensemblemethod AT hadifazal pscl2lsaesmbioimagebasedpredictionofproteinsubcellularlocalizationbyintegratingheterogeneousfeatureswiththetwolevelsaesmandmeanensemblemethod AT songjiangning pscl2lsaesmbioimagebasedpredictionofproteinsubcellularlocalizationbyintegratingheterogeneousfeatureswiththetwolevelsaesmandmeanensemblemethod AT yudongjun pscl2lsaesmbioimagebasedpredictionofproteinsubcellularlocalizationbyintegratingheterogeneousfeatureswiththetwolevelsaesmandmeanensemblemethod |