Cargando…

Critical assessment of coiled-coil predictions based on protein structure data

Coiled-coil regions were among the first protein motifs described structurally and theoretically. The simplicity of the motif promises that coiled-coil regions can be detected with reasonable accuracy and precision in any protein sequence. Here, we re-evaluated the most commonly used coiled-coil pre...

Descripción completa

Detalles Bibliográficos
Autores principales: Simm, Dominic, Hatje, Klas, Waack, Stephan, Kollmar, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8203680/
https://www.ncbi.nlm.nih.gov/pubmed/34127723
http://dx.doi.org/10.1038/s41598-021-91886-w
_version_ 1783708220823437312
author Simm, Dominic
Hatje, Klas
Waack, Stephan
Kollmar, Martin
author_facet Simm, Dominic
Hatje, Klas
Waack, Stephan
Kollmar, Martin
author_sort Simm, Dominic
collection PubMed
description Coiled-coil regions were among the first protein motifs described structurally and theoretically. The simplicity of the motif promises that coiled-coil regions can be detected with reasonable accuracy and precision in any protein sequence. Here, we re-evaluated the most commonly used coiled-coil prediction tools with respect to the most comprehensive reference data set available, the entire Protein Data Bank, down to each amino acid and its secondary structure. Apart from the 30-fold difference in minimum and maximum number of coiled coils predicted the tools strongly vary in where they predict coiled-coil regions. Accordingly, there is a high number of false predictions and missed, true coiled-coil regions. The evaluation of the binary classification metrics in comparison with naïve coin-flip models and the calculation of the Matthews correlation coefficient, the most reliable performance metric for imbalanced data sets, suggests that the tested tools’ performance is close to random. This implicates that the tools’ predictions have only limited informative value. Coiled-coil predictions are often used to interpret biochemical data and are part of in-silico functional genome annotation. Our results indicate that these predictions should be treated very cautiously and need to be supported and validated by experimental evidence.
format Online
Article
Text
id pubmed-8203680
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-82036802021-06-15 Critical assessment of coiled-coil predictions based on protein structure data Simm, Dominic Hatje, Klas Waack, Stephan Kollmar, Martin Sci Rep Article Coiled-coil regions were among the first protein motifs described structurally and theoretically. The simplicity of the motif promises that coiled-coil regions can be detected with reasonable accuracy and precision in any protein sequence. Here, we re-evaluated the most commonly used coiled-coil prediction tools with respect to the most comprehensive reference data set available, the entire Protein Data Bank, down to each amino acid and its secondary structure. Apart from the 30-fold difference in minimum and maximum number of coiled coils predicted the tools strongly vary in where they predict coiled-coil regions. Accordingly, there is a high number of false predictions and missed, true coiled-coil regions. The evaluation of the binary classification metrics in comparison with naïve coin-flip models and the calculation of the Matthews correlation coefficient, the most reliable performance metric for imbalanced data sets, suggests that the tested tools’ performance is close to random. This implicates that the tools’ predictions have only limited informative value. Coiled-coil predictions are often used to interpret biochemical data and are part of in-silico functional genome annotation. Our results indicate that these predictions should be treated very cautiously and need to be supported and validated by experimental evidence. Nature Publishing Group UK 2021-06-14 /pmc/articles/PMC8203680/ /pubmed/34127723 http://dx.doi.org/10.1038/s41598-021-91886-w Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Simm, Dominic
Hatje, Klas
Waack, Stephan
Kollmar, Martin
Critical assessment of coiled-coil predictions based on protein structure data
title Critical assessment of coiled-coil predictions based on protein structure data
title_full Critical assessment of coiled-coil predictions based on protein structure data
title_fullStr Critical assessment of coiled-coil predictions based on protein structure data
title_full_unstemmed Critical assessment of coiled-coil predictions based on protein structure data
title_short Critical assessment of coiled-coil predictions based on protein structure data
title_sort critical assessment of coiled-coil predictions based on protein structure data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8203680/
https://www.ncbi.nlm.nih.gov/pubmed/34127723
http://dx.doi.org/10.1038/s41598-021-91886-w
work_keys_str_mv AT simmdominic criticalassessmentofcoiledcoilpredictionsbasedonproteinstructuredata
AT hatjeklas criticalassessmentofcoiledcoilpredictionsbasedonproteinstructuredata
AT waackstephan criticalassessmentofcoiledcoilpredictionsbasedonproteinstructuredata
AT kollmarmartin criticalassessmentofcoiledcoilpredictionsbasedonproteinstructuredata