Cargando…

Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures

With the debut of AlphaFold2, we now can get a highly-accurate view of a reasonable equilibrium tertiary structure of a protein molecule. Yet, a single-structure view is insufficient and does not account for the high structural plasticity of protein molecules. Obtaining a multi-structure view of a p...

Descripción completa

Detalles Bibliográficos
Autores principales: Alam, Fardina Fathmiul, Shehu, Amarda
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9313347/
https://www.ncbi.nlm.nih.gov/pubmed/35883464
http://dx.doi.org/10.3390/biom12070908
_version_ 1784754057994305536
author Alam, Fardina Fathmiul
Shehu, Amarda
author_facet Alam, Fardina Fathmiul
Shehu, Amarda
author_sort Alam, Fardina Fathmiul
collection PubMed
description With the debut of AlphaFold2, we now can get a highly-accurate view of a reasonable equilibrium tertiary structure of a protein molecule. Yet, a single-structure view is insufficient and does not account for the high structural plasticity of protein molecules. Obtaining a multi-structure view of a protein molecule continues to be an outstanding challenge in computational structural biology. In tandem with methods formulated under the umbrella of stochastic optimization, we are now seeing rapid advances in the capabilities of methods based on deep learning. In recent work, we advance the capability of these models to learn from experimentally-available tertiary structures of protein molecules of varying lengths. In this work, we elucidate the important role of the composition of the training dataset on the neural network’s ability to learn key local and distal patterns in tertiary structures. To make such patterns visible to the network, we utilize a contact map-based representation of protein tertiary structure. We show interesting relationships between data size, quality, and composition on the ability of latent variable models to learn key patterns of tertiary structure. In addition, we present a disentangled latent variable model which improves upon the state-of-the-art variable autoencoder-based model in key, physically-realistic structural patterns. We believe this work opens up further avenues of research on deep learning-based models for computing multi-structure views of protein molecules.
format Online
Article
Text
id pubmed-9313347
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93133472022-07-26 Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures Alam, Fardina Fathmiul Shehu, Amarda Biomolecules Article With the debut of AlphaFold2, we now can get a highly-accurate view of a reasonable equilibrium tertiary structure of a protein molecule. Yet, a single-structure view is insufficient and does not account for the high structural plasticity of protein molecules. Obtaining a multi-structure view of a protein molecule continues to be an outstanding challenge in computational structural biology. In tandem with methods formulated under the umbrella of stochastic optimization, we are now seeing rapid advances in the capabilities of methods based on deep learning. In recent work, we advance the capability of these models to learn from experimentally-available tertiary structures of protein molecules of varying lengths. In this work, we elucidate the important role of the composition of the training dataset on the neural network’s ability to learn key local and distal patterns in tertiary structures. To make such patterns visible to the network, we utilize a contact map-based representation of protein tertiary structure. We show interesting relationships between data size, quality, and composition on the ability of latent variable models to learn key patterns of tertiary structure. In addition, we present a disentangled latent variable model which improves upon the state-of-the-art variable autoencoder-based model in key, physically-realistic structural patterns. We believe this work opens up further avenues of research on deep learning-based models for computing multi-structure views of protein molecules. MDPI 2022-06-29 /pmc/articles/PMC9313347/ /pubmed/35883464 http://dx.doi.org/10.3390/biom12070908 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Alam, Fardina Fathmiul
Shehu, Amarda
Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures
title Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures
title_full Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures
title_fullStr Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures
title_full_unstemmed Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures
title_short Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures
title_sort data size and quality matter: generating physically-realistic distance maps of protein tertiary structures
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9313347/
https://www.ncbi.nlm.nih.gov/pubmed/35883464
http://dx.doi.org/10.3390/biom12070908
work_keys_str_mv AT alamfardinafathmiul datasizeandqualitymattergeneratingphysicallyrealisticdistancemapsofproteintertiarystructures
AT shehuamarda datasizeandqualitymattergeneratingphysicallyrealisticdistancemapsofproteintertiarystructures