Cargando…

SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins

Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain l...

Descripción completa

Detalles Bibliográficos
Autores principales: Ahmad, Saeed, Charoenkwan, Phasit, Quinn, Julian M. W., Moni, Mohammad Ali, Hasan, Md Mehedi, Lio’, Pietro, Shoombuatong, Watshara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8904530/
https://www.ncbi.nlm.nih.gov/pubmed/35260777
http://dx.doi.org/10.1038/s41598-022-08173-5
_version_ 1784664972154896384
author Ahmad, Saeed
Charoenkwan, Phasit
Quinn, Julian M. W.
Moni, Mohammad Ali
Hasan, Md Mehedi
Lio’, Pietro
Shoombuatong, Watshara
author_facet Ahmad, Saeed
Charoenkwan, Phasit
Quinn, Julian M. W.
Moni, Mohammad Ali
Hasan, Md Mehedi
Lio’, Pietro
Shoombuatong, Watshara
author_sort Ahmad, Saeed
collection PubMed
description Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository (https://github.com/saeed344/SCORPION).
format Online
Article
Text
id pubmed-8904530
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-89045302022-03-09 SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins Ahmad, Saeed Charoenkwan, Phasit Quinn, Julian M. W. Moni, Mohammad Ali Hasan, Md Mehedi Lio’, Pietro Shoombuatong, Watshara Sci Rep Article Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository (https://github.com/saeed344/SCORPION). Nature Publishing Group UK 2022-03-08 /pmc/articles/PMC8904530/ /pubmed/35260777 http://dx.doi.org/10.1038/s41598-022-08173-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Ahmad, Saeed
Charoenkwan, Phasit
Quinn, Julian M. W.
Moni, Mohammad Ali
Hasan, Md Mehedi
Lio’, Pietro
Shoombuatong, Watshara
SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
title SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
title_full SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
title_fullStr SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
title_full_unstemmed SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
title_short SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
title_sort scorpion is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8904530/
https://www.ncbi.nlm.nih.gov/pubmed/35260777
http://dx.doi.org/10.1038/s41598-022-08173-5
work_keys_str_mv AT ahmadsaeed scorpionisastackingbasedensemblelearningframeworkforaccuratepredictionofphagevirionproteins
AT charoenkwanphasit scorpionisastackingbasedensemblelearningframeworkforaccuratepredictionofphagevirionproteins
AT quinnjulianmw scorpionisastackingbasedensemblelearningframeworkforaccuratepredictionofphagevirionproteins
AT monimohammadali scorpionisastackingbasedensemblelearningframeworkforaccuratepredictionofphagevirionproteins
AT hasanmdmehedi scorpionisastackingbasedensemblelearningframeworkforaccuratepredictionofphagevirionproteins
AT liopietro scorpionisastackingbasedensemblelearningframeworkforaccuratepredictionofphagevirionproteins
AT shoombuatongwatshara scorpionisastackingbasedensemblelearningframeworkforaccuratepredictionofphagevirionproteins