Cargando…

CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting

[Image: see text] Molecular dynamics (MD) simulations are an exceedingly and increasingly potent tool for molecular behavior prediction and analysis. However, the enormous wealth of data generated by these simulations can be difficult to process and render in a human-readable fashion. Cluster analys...

Descripción completa

Detalles Bibliográficos
Autores principales: Damjanovic, Jovan, Murphy, James M., Lin, Yu-Shan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2021
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8549068/
https://www.ncbi.nlm.nih.gov/pubmed/34608796
http://dx.doi.org/10.1021/acs.jcim.1c00598
_version_ 1784590714069319680
author Damjanovic, Jovan
Murphy, James M.
Lin, Yu-Shan
author_facet Damjanovic, Jovan
Murphy, James M.
Lin, Yu-Shan
author_sort Damjanovic, Jovan
collection PubMed
description [Image: see text] Molecular dynamics (MD) simulations are an exceedingly and increasingly potent tool for molecular behavior prediction and analysis. However, the enormous wealth of data generated by these simulations can be difficult to process and render in a human-readable fashion. Cluster analysis is a commonly used way to partition data into structurally distinct states. We present a method that improves on the state of the art by taking advantage of the temporal information of MD trajectories to enable more accurate clustering at a lower memory cost. To date, cluster analysis of MD simulations has generally treated simulation snapshots as a mere collection of independent data points and attempted to separate them into different clusters based on structural similarity. This new method, cluster analysis of trajectories based on segment splitting (CATBOSS), applies density-peak-based clustering to classify trajectory segments learned by change detection. Applying the method to a synthetic toy model as well as four real-life data sets–trajectories of MD simulations of alanine dipeptide and valine dipeptide as well as two fast-folding proteins–we find CATBOSS to be robust and highly performant, yielding natural-looking cluster boundaries and greatly improving clustering resolution. As the classification of points into segments emphasizes density gaps in the data by grouping them close to the state means, CATBOSS applied to the valine dipeptide system is even able to account for a degree of freedom deliberately omitted from the input data set. We also demonstrate the potential utility of CATBOSS in distinguishing metastable states from transition segments as well as promising application to cases where there is little or no advance knowledge of intrinsic coordinates, making for a highly versatile analysis tool.
format Online
Article
Text
id pubmed-8549068
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-85490682021-10-27 CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting Damjanovic, Jovan Murphy, James M. Lin, Yu-Shan J Chem Inf Model [Image: see text] Molecular dynamics (MD) simulations are an exceedingly and increasingly potent tool for molecular behavior prediction and analysis. However, the enormous wealth of data generated by these simulations can be difficult to process and render in a human-readable fashion. Cluster analysis is a commonly used way to partition data into structurally distinct states. We present a method that improves on the state of the art by taking advantage of the temporal information of MD trajectories to enable more accurate clustering at a lower memory cost. To date, cluster analysis of MD simulations has generally treated simulation snapshots as a mere collection of independent data points and attempted to separate them into different clusters based on structural similarity. This new method, cluster analysis of trajectories based on segment splitting (CATBOSS), applies density-peak-based clustering to classify trajectory segments learned by change detection. Applying the method to a synthetic toy model as well as four real-life data sets–trajectories of MD simulations of alanine dipeptide and valine dipeptide as well as two fast-folding proteins–we find CATBOSS to be robust and highly performant, yielding natural-looking cluster boundaries and greatly improving clustering resolution. As the classification of points into segments emphasizes density gaps in the data by grouping them close to the state means, CATBOSS applied to the valine dipeptide system is even able to account for a degree of freedom deliberately omitted from the input data set. We also demonstrate the potential utility of CATBOSS in distinguishing metastable states from transition segments as well as promising application to cases where there is little or no advance knowledge of intrinsic coordinates, making for a highly versatile analysis tool. American Chemical Society 2021-10-05 2021-10-25 /pmc/articles/PMC8549068/ /pubmed/34608796 http://dx.doi.org/10.1021/acs.jcim.1c00598 Text en © 2021 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Damjanovic, Jovan
Murphy, James M.
Lin, Yu-Shan
CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting
title CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting
title_full CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting
title_fullStr CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting
title_full_unstemmed CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting
title_short CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting
title_sort catboss: cluster analysis of trajectories based on segment splitting
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8549068/
https://www.ncbi.nlm.nih.gov/pubmed/34608796
http://dx.doi.org/10.1021/acs.jcim.1c00598
work_keys_str_mv AT damjanovicjovan catbossclusteranalysisoftrajectoriesbasedonsegmentsplitting
AT murphyjamesm catbossclusteranalysisoftrajectoriesbasedonsegmentsplitting
AT linyushan catbossclusteranalysisoftrajectoriesbasedonsegmentsplitting