Cargando…

Interpreting a black box predictor to gain insights into early folding mechanisms

Protein folding and function are closely connected, but the exact mechanisms by which proteins fold remain elusive. Early folding residues (EFRs) are amino acids within a particular protein that induce the very first stages of the folding process. High-resolution EFR data are only available for few...

Descripción completa

Detalles Bibliográficos
Autores principales: Grau, Isel, Nowé, Ann, Vranken, Wim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8433119/
https://www.ncbi.nlm.nih.gov/pubmed/34527196
http://dx.doi.org/10.1016/j.csbj.2021.08.041
_version_ 1783751309910867968
author Grau, Isel
Nowé, Ann
Vranken, Wim
author_facet Grau, Isel
Nowé, Ann
Vranken, Wim
author_sort Grau, Isel
collection PubMed
description Protein folding and function are closely connected, but the exact mechanisms by which proteins fold remain elusive. Early folding residues (EFRs) are amino acids within a particular protein that induce the very first stages of the folding process. High-resolution EFR data are only available for few proteins, which has previously enabled the training of a protein sequence-based machine learning 'black box' predictor (EFoldMine). Such a black box approach does not allow a direct extraction of the 'early folding rules' embedded in the protein sequence, whilst such interpretation is essential to improve our understanding of how the folding process works. We here apply and investigate a novel 'grey box' approach to the prediction of EFRs from protein sequence to gain mechanistic residue-level insights into the sequence determinants of EFRs in proteins. We interpret the rule set for three datasets, a default set comprised of natural proteins, a scrambled set comprised of the scrambled default set sequences, and a set of de novo designed proteins. Finally, we relate these data to the secondary structure adopted in the folded protein and provide all information online via http://xefoldmine.bio2byte.be/, as a resource to help understand and steer early protein folding.
format Online
Article
Text
id pubmed-8433119
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-84331192021-09-14 Interpreting a black box predictor to gain insights into early folding mechanisms Grau, Isel Nowé, Ann Vranken, Wim Comput Struct Biotechnol J Research Article Protein folding and function are closely connected, but the exact mechanisms by which proteins fold remain elusive. Early folding residues (EFRs) are amino acids within a particular protein that induce the very first stages of the folding process. High-resolution EFR data are only available for few proteins, which has previously enabled the training of a protein sequence-based machine learning 'black box' predictor (EFoldMine). Such a black box approach does not allow a direct extraction of the 'early folding rules' embedded in the protein sequence, whilst such interpretation is essential to improve our understanding of how the folding process works. We here apply and investigate a novel 'grey box' approach to the prediction of EFRs from protein sequence to gain mechanistic residue-level insights into the sequence determinants of EFRs in proteins. We interpret the rule set for three datasets, a default set comprised of natural proteins, a scrambled set comprised of the scrambled default set sequences, and a set of de novo designed proteins. Finally, we relate these data to the secondary structure adopted in the folded protein and provide all information online via http://xefoldmine.bio2byte.be/, as a resource to help understand and steer early protein folding. Research Network of Computational and Structural Biotechnology 2021-08-27 /pmc/articles/PMC8433119/ /pubmed/34527196 http://dx.doi.org/10.1016/j.csbj.2021.08.041 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Grau, Isel
Nowé, Ann
Vranken, Wim
Interpreting a black box predictor to gain insights into early folding mechanisms
title Interpreting a black box predictor to gain insights into early folding mechanisms
title_full Interpreting a black box predictor to gain insights into early folding mechanisms
title_fullStr Interpreting a black box predictor to gain insights into early folding mechanisms
title_full_unstemmed Interpreting a black box predictor to gain insights into early folding mechanisms
title_short Interpreting a black box predictor to gain insights into early folding mechanisms
title_sort interpreting a black box predictor to gain insights into early folding mechanisms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8433119/
https://www.ncbi.nlm.nih.gov/pubmed/34527196
http://dx.doi.org/10.1016/j.csbj.2021.08.041
work_keys_str_mv AT grauisel interpretingablackboxpredictortogaininsightsintoearlyfoldingmechanisms
AT noweann interpretingablackboxpredictortogaininsightsintoearlyfoldingmechanisms
AT vrankenwim interpretingablackboxpredictortogaininsightsintoearlyfoldingmechanisms