Cargando…

SNARER: new molecular descriptors for SNARE proteins classification

BACKGROUND: SNARE proteins play an important role in different biological functions. This study aims to investigate the contribution of a new class of molecular descriptors (called SNARER) related to the chemical-physical properties of proteins in order to evaluate the performance of binary classifi...

Descripción completa

Detalles Bibliográficos
Autores principales: Auriemma Citarella, Alessia, Di Biasi, Luigi, Risi, Michele, Tortora, Genoveffa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9035248/
https://www.ncbi.nlm.nih.gov/pubmed/35462533
http://dx.doi.org/10.1186/s12859-022-04677-z
_version_ 1784693256010858496
author Auriemma Citarella, Alessia
Di Biasi, Luigi
Risi, Michele
Tortora, Genoveffa
author_facet Auriemma Citarella, Alessia
Di Biasi, Luigi
Risi, Michele
Tortora, Genoveffa
author_sort Auriemma Citarella, Alessia
collection PubMed
description BACKGROUND: SNARE proteins play an important role in different biological functions. This study aims to investigate the contribution of a new class of molecular descriptors (called SNARER) related to the chemical-physical properties of proteins in order to evaluate the performance of binary classifiers for SNARE proteins. RESULTS: We constructed a SNARE proteins balanced dataset, D128, and an unbalanced one, DUNI, on which we tested and compared the performance of the new descriptors presented here in combination with the feature sets (GAAC, CTDT, CKSAAP and 188D) already present in the literature. The machine learning algorithms used were Random Forest, k-Nearest Neighbors and AdaBoost and oversampling and subsampling techniques were applied to the unbalanced dataset. The addition of the SNARER descriptors increases the precision for all considered ML algorithms. In particular, on the unbalanced DUNI dataset the accuracy increases in parallel with the increase in sensitivity while on the balanced dataset D128 the accuracy increases compared to the counterpart without the addition of SNARER descriptors, with a strong improvement in specificity. Our best result is the combination of our descriptors SNARER with CKSAAP feature on the dataset D128 with 92.3% of accuracy, 90.1% for sensitivity and 95% for specificity with the RF algorithm. CONCLUSIONS: The performed analysis has shown how the introduction of molecular descriptors linked to the chemical-physical and structural characteristics of the proteins can improve the classification performance. Additionally, it was pointed out that performance can change based on using a balanced or unbalanced dataset. The balanced nature of training can significantly improve forecast accuracy.
format Online
Article
Text
id pubmed-9035248
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-90352482022-04-25 SNARER: new molecular descriptors for SNARE proteins classification Auriemma Citarella, Alessia Di Biasi, Luigi Risi, Michele Tortora, Genoveffa BMC Bioinformatics Research BACKGROUND: SNARE proteins play an important role in different biological functions. This study aims to investigate the contribution of a new class of molecular descriptors (called SNARER) related to the chemical-physical properties of proteins in order to evaluate the performance of binary classifiers for SNARE proteins. RESULTS: We constructed a SNARE proteins balanced dataset, D128, and an unbalanced one, DUNI, on which we tested and compared the performance of the new descriptors presented here in combination with the feature sets (GAAC, CTDT, CKSAAP and 188D) already present in the literature. The machine learning algorithms used were Random Forest, k-Nearest Neighbors and AdaBoost and oversampling and subsampling techniques were applied to the unbalanced dataset. The addition of the SNARER descriptors increases the precision for all considered ML algorithms. In particular, on the unbalanced DUNI dataset the accuracy increases in parallel with the increase in sensitivity while on the balanced dataset D128 the accuracy increases compared to the counterpart without the addition of SNARER descriptors, with a strong improvement in specificity. Our best result is the combination of our descriptors SNARER with CKSAAP feature on the dataset D128 with 92.3% of accuracy, 90.1% for sensitivity and 95% for specificity with the RF algorithm. CONCLUSIONS: The performed analysis has shown how the introduction of molecular descriptors linked to the chemical-physical and structural characteristics of the proteins can improve the classification performance. Additionally, it was pointed out that performance can change based on using a balanced or unbalanced dataset. The balanced nature of training can significantly improve forecast accuracy. BioMed Central 2022-04-24 /pmc/articles/PMC9035248/ /pubmed/35462533 http://dx.doi.org/10.1186/s12859-022-04677-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Auriemma Citarella, Alessia
Di Biasi, Luigi
Risi, Michele
Tortora, Genoveffa
SNARER: new molecular descriptors for SNARE proteins classification
title SNARER: new molecular descriptors for SNARE proteins classification
title_full SNARER: new molecular descriptors for SNARE proteins classification
title_fullStr SNARER: new molecular descriptors for SNARE proteins classification
title_full_unstemmed SNARER: new molecular descriptors for SNARE proteins classification
title_short SNARER: new molecular descriptors for SNARE proteins classification
title_sort snarer: new molecular descriptors for snare proteins classification
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9035248/
https://www.ncbi.nlm.nih.gov/pubmed/35462533
http://dx.doi.org/10.1186/s12859-022-04677-z
work_keys_str_mv AT auriemmacitarellaalessia snarernewmoleculardescriptorsforsnareproteinsclassification
AT dibiasiluigi snarernewmoleculardescriptorsforsnareproteinsclassification
AT risimichele snarernewmoleculardescriptorsforsnareproteinsclassification
AT tortoragenoveffa snarernewmoleculardescriptorsforsnareproteinsclassification