Cargando…
CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting
[Image: see text] Molecular dynamics (MD) simulations are an exceedingly and increasingly potent tool for molecular behavior prediction and analysis. However, the enormous wealth of data generated by these simulations can be difficult to process and render in a human-readable fashion. Cluster analys...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2021
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8549068/ https://www.ncbi.nlm.nih.gov/pubmed/34608796 http://dx.doi.org/10.1021/acs.jcim.1c00598 |
_version_ | 1784590714069319680 |
---|---|
author | Damjanovic, Jovan Murphy, James M. Lin, Yu-Shan |
author_facet | Damjanovic, Jovan Murphy, James M. Lin, Yu-Shan |
author_sort | Damjanovic, Jovan |
collection | PubMed |
description | [Image: see text] Molecular dynamics (MD) simulations are an exceedingly and increasingly potent tool for molecular behavior prediction and analysis. However, the enormous wealth of data generated by these simulations can be difficult to process and render in a human-readable fashion. Cluster analysis is a commonly used way to partition data into structurally distinct states. We present a method that improves on the state of the art by taking advantage of the temporal information of MD trajectories to enable more accurate clustering at a lower memory cost. To date, cluster analysis of MD simulations has generally treated simulation snapshots as a mere collection of independent data points and attempted to separate them into different clusters based on structural similarity. This new method, cluster analysis of trajectories based on segment splitting (CATBOSS), applies density-peak-based clustering to classify trajectory segments learned by change detection. Applying the method to a synthetic toy model as well as four real-life data sets–trajectories of MD simulations of alanine dipeptide and valine dipeptide as well as two fast-folding proteins–we find CATBOSS to be robust and highly performant, yielding natural-looking cluster boundaries and greatly improving clustering resolution. As the classification of points into segments emphasizes density gaps in the data by grouping them close to the state means, CATBOSS applied to the valine dipeptide system is even able to account for a degree of freedom deliberately omitted from the input data set. We also demonstrate the potential utility of CATBOSS in distinguishing metastable states from transition segments as well as promising application to cases where there is little or no advance knowledge of intrinsic coordinates, making for a highly versatile analysis tool. |
format | Online Article Text |
id | pubmed-8549068 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-85490682021-10-27 CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting Damjanovic, Jovan Murphy, James M. Lin, Yu-Shan J Chem Inf Model [Image: see text] Molecular dynamics (MD) simulations are an exceedingly and increasingly potent tool for molecular behavior prediction and analysis. However, the enormous wealth of data generated by these simulations can be difficult to process and render in a human-readable fashion. Cluster analysis is a commonly used way to partition data into structurally distinct states. We present a method that improves on the state of the art by taking advantage of the temporal information of MD trajectories to enable more accurate clustering at a lower memory cost. To date, cluster analysis of MD simulations has generally treated simulation snapshots as a mere collection of independent data points and attempted to separate them into different clusters based on structural similarity. This new method, cluster analysis of trajectories based on segment splitting (CATBOSS), applies density-peak-based clustering to classify trajectory segments learned by change detection. Applying the method to a synthetic toy model as well as four real-life data sets–trajectories of MD simulations of alanine dipeptide and valine dipeptide as well as two fast-folding proteins–we find CATBOSS to be robust and highly performant, yielding natural-looking cluster boundaries and greatly improving clustering resolution. As the classification of points into segments emphasizes density gaps in the data by grouping them close to the state means, CATBOSS applied to the valine dipeptide system is even able to account for a degree of freedom deliberately omitted from the input data set. We also demonstrate the potential utility of CATBOSS in distinguishing metastable states from transition segments as well as promising application to cases where there is little or no advance knowledge of intrinsic coordinates, making for a highly versatile analysis tool. American Chemical Society 2021-10-05 2021-10-25 /pmc/articles/PMC8549068/ /pubmed/34608796 http://dx.doi.org/10.1021/acs.jcim.1c00598 Text en © 2021 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Damjanovic, Jovan Murphy, James M. Lin, Yu-Shan CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting |
title | CATBOSS: Cluster Analysis of Trajectories Based on
Segment Splitting |
title_full | CATBOSS: Cluster Analysis of Trajectories Based on
Segment Splitting |
title_fullStr | CATBOSS: Cluster Analysis of Trajectories Based on
Segment Splitting |
title_full_unstemmed | CATBOSS: Cluster Analysis of Trajectories Based on
Segment Splitting |
title_short | CATBOSS: Cluster Analysis of Trajectories Based on
Segment Splitting |
title_sort | catboss: cluster analysis of trajectories based on
segment splitting |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8549068/ https://www.ncbi.nlm.nih.gov/pubmed/34608796 http://dx.doi.org/10.1021/acs.jcim.1c00598 |
work_keys_str_mv | AT damjanovicjovan catbossclusteranalysisoftrajectoriesbasedonsegmentsplitting AT murphyjamesm catbossclusteranalysisoftrajectoriesbasedonsegmentsplitting AT linyushan catbossclusteranalysisoftrajectoriesbasedonsegmentsplitting |