Cargando…

PySFD: comprehensive molecular insights from significant feature differences detected among many simulated ensembles

MOTIVATION: Many modeling analyses of molecular dynamics (MD) simulations are based on a definition of states that can be (groups of) clusters of simulation frames in a feature space composed of molecular coordinates. With increasing dimension of this feature space (due to the increasing size or com...

Descripción completa

Detalles Bibliográficos
Autor principal: Stolzenberg, Sebastian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6499238/
https://www.ncbi.nlm.nih.gov/pubmed/30247628
http://dx.doi.org/10.1093/bioinformatics/bty818
Descripción
Sumario:MOTIVATION: Many modeling analyses of molecular dynamics (MD) simulations are based on a definition of states that can be (groups of) clusters of simulation frames in a feature space composed of molecular coordinates. With increasing dimension of this feature space (due to the increasing size or complexity of a simulated molecule), it becomes very difficult to cluster the underlying MD data and estimate a statistically robust model. To mitigate this “curse of dimensionality”, one can reduce the feature space, e.g., with principal component or time-lagged independent component analysis transformations, focusing the analysis on the most important modes of transitions. In practice, however, all these reduction strategies may neglect important molecular details that are susceptible to experimental verification. RESULTS: To recover such molecular details, I have developed PySFD (Significant Feature Differences analyzer for Python), a multi-processing software package that efficiently selects significantly different features of any user-defined feature type among potentially many different simulated state ensembles, such as meta-stable states of a Markov State Model (MSM). Applying PySFD on MSMs of an aggregate of 300 microseconds MD simulations recently performed on the major histocompatibility complex class II (MHCII) protein, I demonstrate how this toolkit can extract and visualize valuable mechanistic information from big MD simulation data, e.g., in form of networks of dynamic interaction changes connecting functionally relevant sites of a protein complex. AVAILABILITY AND IMPLEMENTATION: PySFD is freely available under the L-GPL license at https://github.com/markovmodel/PySFD. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.