Cargando…

Graph identification of proteins in tomograms (GRIP‐Tomo)

In this study, we present a method of pattern mining based on network theory that enables the identification of protein structures or complexes from synthetic volume densities, without the knowledge of predefined templates or human biases for refinement. We hypothesized that the topological connecti...

Descripción completa

Detalles Bibliográficos
Autores principales: George, August, Kim, Doo Nam, Moser, Trevor, Gildea, Ian T., Evans, James E., Cheung, Margaret S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9798246/
https://www.ncbi.nlm.nih.gov/pubmed/36482866
http://dx.doi.org/10.1002/pro.4538
_version_ 1784860867882385408
author George, August
Kim, Doo Nam
Moser, Trevor
Gildea, Ian T.
Evans, James E.
Cheung, Margaret S.
author_facet George, August
Kim, Doo Nam
Moser, Trevor
Gildea, Ian T.
Evans, James E.
Cheung, Margaret S.
author_sort George, August
collection PubMed
description In this study, we present a method of pattern mining based on network theory that enables the identification of protein structures or complexes from synthetic volume densities, without the knowledge of predefined templates or human biases for refinement. We hypothesized that the topological connectivity of protein structures is invariant, and they are distinctive for the purpose of protein identification from distorted data presented in volume densities. Three‐dimensional densities of a protein or a complex from simulated tomographic volumes were transformed into mathematical graphs as observables. We systematically introduced data distortion or defects such as missing fullness of data, the tumbling effect, and the missing wedge effect into the simulated volumes, and varied the distance cutoffs in pixels to capture the varying connectivity between the density cluster centroids in the presence of defects. A similarity score between the graphs from the simulated volumes and the graphs transformed from the physical protein structures in point data was calculated by comparing their network theory order parameters including node degrees, betweenness centrality, and graph densities. By capturing the essential topological features defining the heterogeneous morphologies of a network, we were able to accurately identify proteins and homo‐multimeric complexes from 10 topologically distinctive samples without realistic noise added. Our approach empowers future developments of tomogram processing by providing pattern mining with interpretability, to enable the classification of single‐domain protein native topologies as well as distinct single‐domain proteins from multimeric complexes within noisy volumes.
format Online
Article
Text
id pubmed-9798246
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-97982462023-01-05 Graph identification of proteins in tomograms (GRIP‐Tomo) George, August Kim, Doo Nam Moser, Trevor Gildea, Ian T. Evans, James E. Cheung, Margaret S. Protein Sci Full‐length Papers In this study, we present a method of pattern mining based on network theory that enables the identification of protein structures or complexes from synthetic volume densities, without the knowledge of predefined templates or human biases for refinement. We hypothesized that the topological connectivity of protein structures is invariant, and they are distinctive for the purpose of protein identification from distorted data presented in volume densities. Three‐dimensional densities of a protein or a complex from simulated tomographic volumes were transformed into mathematical graphs as observables. We systematically introduced data distortion or defects such as missing fullness of data, the tumbling effect, and the missing wedge effect into the simulated volumes, and varied the distance cutoffs in pixels to capture the varying connectivity between the density cluster centroids in the presence of defects. A similarity score between the graphs from the simulated volumes and the graphs transformed from the physical protein structures in point data was calculated by comparing their network theory order parameters including node degrees, betweenness centrality, and graph densities. By capturing the essential topological features defining the heterogeneous morphologies of a network, we were able to accurately identify proteins and homo‐multimeric complexes from 10 topologically distinctive samples without realistic noise added. Our approach empowers future developments of tomogram processing by providing pattern mining with interpretability, to enable the classification of single‐domain protein native topologies as well as distinct single‐domain proteins from multimeric complexes within noisy volumes. John Wiley & Sons, Inc. 2023-01-01 /pmc/articles/PMC9798246/ /pubmed/36482866 http://dx.doi.org/10.1002/pro.4538 Text en © 2022 Battelle Memorial Institute. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Full‐length Papers
George, August
Kim, Doo Nam
Moser, Trevor
Gildea, Ian T.
Evans, James E.
Cheung, Margaret S.
Graph identification of proteins in tomograms (GRIP‐Tomo)
title Graph identification of proteins in tomograms (GRIP‐Tomo)
title_full Graph identification of proteins in tomograms (GRIP‐Tomo)
title_fullStr Graph identification of proteins in tomograms (GRIP‐Tomo)
title_full_unstemmed Graph identification of proteins in tomograms (GRIP‐Tomo)
title_short Graph identification of proteins in tomograms (GRIP‐Tomo)
title_sort graph identification of proteins in tomograms (grip‐tomo)
topic Full‐length Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9798246/
https://www.ncbi.nlm.nih.gov/pubmed/36482866
http://dx.doi.org/10.1002/pro.4538
work_keys_str_mv AT georgeaugust graphidentificationofproteinsintomogramsgriptomo
AT kimdoonam graphidentificationofproteinsintomogramsgriptomo
AT mosertrevor graphidentificationofproteinsintomogramsgriptomo
AT gildeaiant graphidentificationofproteinsintomogramsgriptomo
AT evansjamese graphidentificationofproteinsintomogramsgriptomo
AT cheungmargarets graphidentificationofproteinsintomogramsgriptomo