Cargando…
SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain l...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8904530/ https://www.ncbi.nlm.nih.gov/pubmed/35260777 http://dx.doi.org/10.1038/s41598-022-08173-5 |
_version_ | 1784664972154896384 |
---|---|
author | Ahmad, Saeed Charoenkwan, Phasit Quinn, Julian M. W. Moni, Mohammad Ali Hasan, Md Mehedi Lio’, Pietro Shoombuatong, Watshara |
author_facet | Ahmad, Saeed Charoenkwan, Phasit Quinn, Julian M. W. Moni, Mohammad Ali Hasan, Md Mehedi Lio’, Pietro Shoombuatong, Watshara |
author_sort | Ahmad, Saeed |
collection | PubMed |
description | Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository (https://github.com/saeed344/SCORPION). |
format | Online Article Text |
id | pubmed-8904530 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-89045302022-03-09 SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins Ahmad, Saeed Charoenkwan, Phasit Quinn, Julian M. W. Moni, Mohammad Ali Hasan, Md Mehedi Lio’, Pietro Shoombuatong, Watshara Sci Rep Article Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository (https://github.com/saeed344/SCORPION). Nature Publishing Group UK 2022-03-08 /pmc/articles/PMC8904530/ /pubmed/35260777 http://dx.doi.org/10.1038/s41598-022-08173-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Ahmad, Saeed Charoenkwan, Phasit Quinn, Julian M. W. Moni, Mohammad Ali Hasan, Md Mehedi Lio’, Pietro Shoombuatong, Watshara SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins |
title | SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins |
title_full | SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins |
title_fullStr | SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins |
title_full_unstemmed | SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins |
title_short | SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins |
title_sort | scorpion is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8904530/ https://www.ncbi.nlm.nih.gov/pubmed/35260777 http://dx.doi.org/10.1038/s41598-022-08173-5 |
work_keys_str_mv | AT ahmadsaeed scorpionisastackingbasedensemblelearningframeworkforaccuratepredictionofphagevirionproteins AT charoenkwanphasit scorpionisastackingbasedensemblelearningframeworkforaccuratepredictionofphagevirionproteins AT quinnjulianmw scorpionisastackingbasedensemblelearningframeworkforaccuratepredictionofphagevirionproteins AT monimohammadali scorpionisastackingbasedensemblelearningframeworkforaccuratepredictionofphagevirionproteins AT hasanmdmehedi scorpionisastackingbasedensemblelearningframeworkforaccuratepredictionofphagevirionproteins AT liopietro scorpionisastackingbasedensemblelearningframeworkforaccuratepredictionofphagevirionproteins AT shoombuatongwatshara scorpionisastackingbasedensemblelearningframeworkforaccuratepredictionofphagevirionproteins |