Cargando…

Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation

BACKGROUND: The prediction of protein secondary structures is a crucial and significant step for ab initio tertiary structure prediction which delivers the information about proteins activity and functions. As the experimental methods are expensive and sometimes impossible, many SS predictors, mainl...

Descripción completa

Detalles Bibliográficos
Autores principales: Stapor, Katarzyna, Kotowski, Krzysztof, Smolarczyk, Tomasz, Roterman, Irena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8939211/
https://www.ncbi.nlm.nih.gov/pubmed/35317722
http://dx.doi.org/10.1186/s12859-022-04623-z
_version_ 1784672697540673536
author Stapor, Katarzyna
Kotowski, Krzysztof
Smolarczyk, Tomasz
Roterman, Irena
author_facet Stapor, Katarzyna
Kotowski, Krzysztof
Smolarczyk, Tomasz
Roterman, Irena
author_sort Stapor, Katarzyna
collection PubMed
description BACKGROUND: The prediction of protein secondary structures is a crucial and significant step for ab initio tertiary structure prediction which delivers the information about proteins activity and functions. As the experimental methods are expensive and sometimes impossible, many SS predictors, mainly based on different machine learning methods have been proposed for many years. Currently, most of the top methods use evolutionary-based input features produced by PSSM and HHblits software, although quite recently the embeddings—the new description of protein sequences generated by language models (LM) have appeared that could be leveraged as input features. Apart from input features calculation, the top models usually need extensive computational resources for training and prediction and are barely possible to run on a regular PC. SS prediction as the imbalanced classification problem should not be judged by the commonly used Q3/Q8 metrics. Moreover, as the benchmark datasets are not random samples, the classical statistical null hypothesis testing based on the Neyman–Pearson approach is not appropriate. RESULTS: We present a lightweight deep network ProteinUnet2 for SS prediction which is based on U-Net convolutional architecture and evolutionary-based input features (from PSSM and HHblits) as well as SPOT-Contact features. Through an extensive evaluation study, we report the performance of ProteinUnet2 in comparison with top SS prediction methods based on evolutionary information (SAINT and SPOT-1D). We also propose a new statistical methodology for prediction performance assessment based on the significance from Fisher–Pitman permutation tests accompanied by practical significance measured by Cohen’s effect size. CONCLUSIONS: Our results suggest that ProteinUnet2 architecture has much shorter training and inference times while maintaining results similar to SAINT and SPOT-1D predictors. Taking into account the relatively long times of calculating evolutionary-based features (from PSSM in particular), it would be worth conducting the predictive ability tests on embeddings as input features in the future. We strongly believe that our proposed here statistical methodology for the evaluation of SS prediction results will be adopted and used (and even expanded) by the research community. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04623-z.
format Online
Article
Text
id pubmed-8939211
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-89392112022-03-23 Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation Stapor, Katarzyna Kotowski, Krzysztof Smolarczyk, Tomasz Roterman, Irena BMC Bioinformatics Research BACKGROUND: The prediction of protein secondary structures is a crucial and significant step for ab initio tertiary structure prediction which delivers the information about proteins activity and functions. As the experimental methods are expensive and sometimes impossible, many SS predictors, mainly based on different machine learning methods have been proposed for many years. Currently, most of the top methods use evolutionary-based input features produced by PSSM and HHblits software, although quite recently the embeddings—the new description of protein sequences generated by language models (LM) have appeared that could be leveraged as input features. Apart from input features calculation, the top models usually need extensive computational resources for training and prediction and are barely possible to run on a regular PC. SS prediction as the imbalanced classification problem should not be judged by the commonly used Q3/Q8 metrics. Moreover, as the benchmark datasets are not random samples, the classical statistical null hypothesis testing based on the Neyman–Pearson approach is not appropriate. RESULTS: We present a lightweight deep network ProteinUnet2 for SS prediction which is based on U-Net convolutional architecture and evolutionary-based input features (from PSSM and HHblits) as well as SPOT-Contact features. Through an extensive evaluation study, we report the performance of ProteinUnet2 in comparison with top SS prediction methods based on evolutionary information (SAINT and SPOT-1D). We also propose a new statistical methodology for prediction performance assessment based on the significance from Fisher–Pitman permutation tests accompanied by practical significance measured by Cohen’s effect size. CONCLUSIONS: Our results suggest that ProteinUnet2 architecture has much shorter training and inference times while maintaining results similar to SAINT and SPOT-1D predictors. Taking into account the relatively long times of calculating evolutionary-based features (from PSSM in particular), it would be worth conducting the predictive ability tests on embeddings as input features in the future. We strongly believe that our proposed here statistical methodology for the evaluation of SS prediction results will be adopted and used (and even expanded) by the research community. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04623-z. BioMed Central 2022-03-22 /pmc/articles/PMC8939211/ /pubmed/35317722 http://dx.doi.org/10.1186/s12859-022-04623-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Stapor, Katarzyna
Kotowski, Krzysztof
Smolarczyk, Tomasz
Roterman, Irena
Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation
title Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation
title_full Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation
title_fullStr Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation
title_full_unstemmed Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation
title_short Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation
title_sort lightweight proteinunet2 network for protein secondary structure prediction: a step towards proper evaluation
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8939211/
https://www.ncbi.nlm.nih.gov/pubmed/35317722
http://dx.doi.org/10.1186/s12859-022-04623-z
work_keys_str_mv AT staporkatarzyna lightweightproteinunet2networkforproteinsecondarystructurepredictionasteptowardsproperevaluation
AT kotowskikrzysztof lightweightproteinunet2networkforproteinsecondarystructurepredictionasteptowardsproperevaluation
AT smolarczyktomasz lightweightproteinunet2networkforproteinsecondarystructurepredictionasteptowardsproperevaluation
AT rotermanirena lightweightproteinunet2networkforproteinsecondarystructurepredictionasteptowardsproperevaluation