Cargando…
Interpreting a black box predictor to gain insights into early folding mechanisms
Protein folding and function are closely connected, but the exact mechanisms by which proteins fold remain elusive. Early folding residues (EFRs) are amino acids within a particular protein that induce the very first stages of the folding process. High-resolution EFR data are only available for few...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8433119/ https://www.ncbi.nlm.nih.gov/pubmed/34527196 http://dx.doi.org/10.1016/j.csbj.2021.08.041 |
_version_ | 1783751309910867968 |
---|---|
author | Grau, Isel Nowé, Ann Vranken, Wim |
author_facet | Grau, Isel Nowé, Ann Vranken, Wim |
author_sort | Grau, Isel |
collection | PubMed |
description | Protein folding and function are closely connected, but the exact mechanisms by which proteins fold remain elusive. Early folding residues (EFRs) are amino acids within a particular protein that induce the very first stages of the folding process. High-resolution EFR data are only available for few proteins, which has previously enabled the training of a protein sequence-based machine learning 'black box' predictor (EFoldMine). Such a black box approach does not allow a direct extraction of the 'early folding rules' embedded in the protein sequence, whilst such interpretation is essential to improve our understanding of how the folding process works. We here apply and investigate a novel 'grey box' approach to the prediction of EFRs from protein sequence to gain mechanistic residue-level insights into the sequence determinants of EFRs in proteins. We interpret the rule set for three datasets, a default set comprised of natural proteins, a scrambled set comprised of the scrambled default set sequences, and a set of de novo designed proteins. Finally, we relate these data to the secondary structure adopted in the folded protein and provide all information online via http://xefoldmine.bio2byte.be/, as a resource to help understand and steer early protein folding. |
format | Online Article Text |
id | pubmed-8433119 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-84331192021-09-14 Interpreting a black box predictor to gain insights into early folding mechanisms Grau, Isel Nowé, Ann Vranken, Wim Comput Struct Biotechnol J Research Article Protein folding and function are closely connected, but the exact mechanisms by which proteins fold remain elusive. Early folding residues (EFRs) are amino acids within a particular protein that induce the very first stages of the folding process. High-resolution EFR data are only available for few proteins, which has previously enabled the training of a protein sequence-based machine learning 'black box' predictor (EFoldMine). Such a black box approach does not allow a direct extraction of the 'early folding rules' embedded in the protein sequence, whilst such interpretation is essential to improve our understanding of how the folding process works. We here apply and investigate a novel 'grey box' approach to the prediction of EFRs from protein sequence to gain mechanistic residue-level insights into the sequence determinants of EFRs in proteins. We interpret the rule set for three datasets, a default set comprised of natural proteins, a scrambled set comprised of the scrambled default set sequences, and a set of de novo designed proteins. Finally, we relate these data to the secondary structure adopted in the folded protein and provide all information online via http://xefoldmine.bio2byte.be/, as a resource to help understand and steer early protein folding. Research Network of Computational and Structural Biotechnology 2021-08-27 /pmc/articles/PMC8433119/ /pubmed/34527196 http://dx.doi.org/10.1016/j.csbj.2021.08.041 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Grau, Isel Nowé, Ann Vranken, Wim Interpreting a black box predictor to gain insights into early folding mechanisms |
title | Interpreting a black box predictor to gain insights into early folding mechanisms |
title_full | Interpreting a black box predictor to gain insights into early folding mechanisms |
title_fullStr | Interpreting a black box predictor to gain insights into early folding mechanisms |
title_full_unstemmed | Interpreting a black box predictor to gain insights into early folding mechanisms |
title_short | Interpreting a black box predictor to gain insights into early folding mechanisms |
title_sort | interpreting a black box predictor to gain insights into early folding mechanisms |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8433119/ https://www.ncbi.nlm.nih.gov/pubmed/34527196 http://dx.doi.org/10.1016/j.csbj.2021.08.041 |
work_keys_str_mv | AT grauisel interpretingablackboxpredictortogaininsightsintoearlyfoldingmechanisms AT noweann interpretingablackboxpredictortogaininsightsintoearlyfoldingmechanisms AT vrankenwim interpretingablackboxpredictortogaininsightsintoearlyfoldingmechanisms |