Cargando…
Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures
With the debut of AlphaFold2, we now can get a highly-accurate view of a reasonable equilibrium tertiary structure of a protein molecule. Yet, a single-structure view is insufficient and does not account for the high structural plasticity of protein molecules. Obtaining a multi-structure view of a p...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9313347/ https://www.ncbi.nlm.nih.gov/pubmed/35883464 http://dx.doi.org/10.3390/biom12070908 |
_version_ | 1784754057994305536 |
---|---|
author | Alam, Fardina Fathmiul Shehu, Amarda |
author_facet | Alam, Fardina Fathmiul Shehu, Amarda |
author_sort | Alam, Fardina Fathmiul |
collection | PubMed |
description | With the debut of AlphaFold2, we now can get a highly-accurate view of a reasonable equilibrium tertiary structure of a protein molecule. Yet, a single-structure view is insufficient and does not account for the high structural plasticity of protein molecules. Obtaining a multi-structure view of a protein molecule continues to be an outstanding challenge in computational structural biology. In tandem with methods formulated under the umbrella of stochastic optimization, we are now seeing rapid advances in the capabilities of methods based on deep learning. In recent work, we advance the capability of these models to learn from experimentally-available tertiary structures of protein molecules of varying lengths. In this work, we elucidate the important role of the composition of the training dataset on the neural network’s ability to learn key local and distal patterns in tertiary structures. To make such patterns visible to the network, we utilize a contact map-based representation of protein tertiary structure. We show interesting relationships between data size, quality, and composition on the ability of latent variable models to learn key patterns of tertiary structure. In addition, we present a disentangled latent variable model which improves upon the state-of-the-art variable autoencoder-based model in key, physically-realistic structural patterns. We believe this work opens up further avenues of research on deep learning-based models for computing multi-structure views of protein molecules. |
format | Online Article Text |
id | pubmed-9313347 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-93133472022-07-26 Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures Alam, Fardina Fathmiul Shehu, Amarda Biomolecules Article With the debut of AlphaFold2, we now can get a highly-accurate view of a reasonable equilibrium tertiary structure of a protein molecule. Yet, a single-structure view is insufficient and does not account for the high structural plasticity of protein molecules. Obtaining a multi-structure view of a protein molecule continues to be an outstanding challenge in computational structural biology. In tandem with methods formulated under the umbrella of stochastic optimization, we are now seeing rapid advances in the capabilities of methods based on deep learning. In recent work, we advance the capability of these models to learn from experimentally-available tertiary structures of protein molecules of varying lengths. In this work, we elucidate the important role of the composition of the training dataset on the neural network’s ability to learn key local and distal patterns in tertiary structures. To make such patterns visible to the network, we utilize a contact map-based representation of protein tertiary structure. We show interesting relationships between data size, quality, and composition on the ability of latent variable models to learn key patterns of tertiary structure. In addition, we present a disentangled latent variable model which improves upon the state-of-the-art variable autoencoder-based model in key, physically-realistic structural patterns. We believe this work opens up further avenues of research on deep learning-based models for computing multi-structure views of protein molecules. MDPI 2022-06-29 /pmc/articles/PMC9313347/ /pubmed/35883464 http://dx.doi.org/10.3390/biom12070908 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Alam, Fardina Fathmiul Shehu, Amarda Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures |
title | Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures |
title_full | Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures |
title_fullStr | Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures |
title_full_unstemmed | Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures |
title_short | Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures |
title_sort | data size and quality matter: generating physically-realistic distance maps of protein tertiary structures |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9313347/ https://www.ncbi.nlm.nih.gov/pubmed/35883464 http://dx.doi.org/10.3390/biom12070908 |
work_keys_str_mv | AT alamfardinafathmiul datasizeandqualitymattergeneratingphysicallyrealisticdistancemapsofproteintertiarystructures AT shehuamarda datasizeandqualitymattergeneratingphysicallyrealisticdistancemapsofproteintertiarystructures |