Cargando…

Implementation of 3D spatial indexing and compression in a large-scale molecular dynamics simulation database for rapid atomic contact detection

BACKGROUND: Molecular dynamics (MD) simulations offer the ability to observe the dynamics and interactions of both whole macromolecules and individual atoms as a function of time. Taken in context with experimental data, atomic interactions from simulation provide insight into the mechanics of prote...

Descripción completa

Detalles Bibliográficos
Autores principales: Toofanny, Rudesh D, Simms, Andrew M, Beck, David AC, Daggett, Valerie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3166946/
https://www.ncbi.nlm.nih.gov/pubmed/21831299
http://dx.doi.org/10.1186/1471-2105-12-334
_version_ 1782211211976245248
author Toofanny, Rudesh D
Simms, Andrew M
Beck, David AC
Daggett, Valerie
author_facet Toofanny, Rudesh D
Simms, Andrew M
Beck, David AC
Daggett, Valerie
author_sort Toofanny, Rudesh D
collection PubMed
description BACKGROUND: Molecular dynamics (MD) simulations offer the ability to observe the dynamics and interactions of both whole macromolecules and individual atoms as a function of time. Taken in context with experimental data, atomic interactions from simulation provide insight into the mechanics of protein folding, dynamics, and function. The calculation of atomic interactions or contacts from an MD trajectory is computationally demanding and the work required grows exponentially with the size of the simulation system. We describe the implementation of a spatial indexing algorithm in our multi-terabyte MD simulation database that significantly reduces the run-time required for discovery of contacts. The approach is applied to the Dynameomics project data. Spatial indexing, also known as spatial hashing, is a method that divides the simulation space into regular sized bins and attributes an index to each bin. Since, the calculation of contacts is widely employed in the simulation field, we also use this as the basis for testing compression of data tables. We investigate the effects of compression of the trajectory coordinate tables with different options of data and index compression within MS SQL SERVER 2008. RESULTS: Our implementation of spatial indexing speeds up the calculation of contacts over a 1 nanosecond (ns) simulation window by between 14% and 90% (i.e., 1.2 and 10.3 times faster). For a 'full' simulation trajectory (51 ns) spatial indexing reduces the calculation run-time between 31 and 81% (between 1.4 and 5.3 times faster). Compression resulted in reduced table sizes but resulted in no significant difference in the total execution time for neighbour discovery. The greatest compression (~36%) was achieved using page level compression on both the data and indexes. CONCLUSIONS: The spatial indexing scheme significantly decreases the time taken to calculate atomic contacts and could be applied to other multidimensional neighbor discovery problems. The speed up enables on-the-fly calculation and visualization of contacts and rapid cross simulation analysis for knowledge discovery. Using page compression for the atomic coordinate tables and indexes saves ~36% of disk space without any significant decrease in calculation time and should be considered for other non-transactional databases in MS SQL SERVER 2008.
format Online
Article
Text
id pubmed-3166946
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31669462011-09-06 Implementation of 3D spatial indexing and compression in a large-scale molecular dynamics simulation database for rapid atomic contact detection Toofanny, Rudesh D Simms, Andrew M Beck, David AC Daggett, Valerie BMC Bioinformatics Methodology Article BACKGROUND: Molecular dynamics (MD) simulations offer the ability to observe the dynamics and interactions of both whole macromolecules and individual atoms as a function of time. Taken in context with experimental data, atomic interactions from simulation provide insight into the mechanics of protein folding, dynamics, and function. The calculation of atomic interactions or contacts from an MD trajectory is computationally demanding and the work required grows exponentially with the size of the simulation system. We describe the implementation of a spatial indexing algorithm in our multi-terabyte MD simulation database that significantly reduces the run-time required for discovery of contacts. The approach is applied to the Dynameomics project data. Spatial indexing, also known as spatial hashing, is a method that divides the simulation space into regular sized bins and attributes an index to each bin. Since, the calculation of contacts is widely employed in the simulation field, we also use this as the basis for testing compression of data tables. We investigate the effects of compression of the trajectory coordinate tables with different options of data and index compression within MS SQL SERVER 2008. RESULTS: Our implementation of spatial indexing speeds up the calculation of contacts over a 1 nanosecond (ns) simulation window by between 14% and 90% (i.e., 1.2 and 10.3 times faster). For a 'full' simulation trajectory (51 ns) spatial indexing reduces the calculation run-time between 31 and 81% (between 1.4 and 5.3 times faster). Compression resulted in reduced table sizes but resulted in no significant difference in the total execution time for neighbour discovery. The greatest compression (~36%) was achieved using page level compression on both the data and indexes. CONCLUSIONS: The spatial indexing scheme significantly decreases the time taken to calculate atomic contacts and could be applied to other multidimensional neighbor discovery problems. The speed up enables on-the-fly calculation and visualization of contacts and rapid cross simulation analysis for knowledge discovery. Using page compression for the atomic coordinate tables and indexes saves ~36% of disk space without any significant decrease in calculation time and should be considered for other non-transactional databases in MS SQL SERVER 2008. BioMed Central 2011-08-10 /pmc/articles/PMC3166946/ /pubmed/21831299 http://dx.doi.org/10.1186/1471-2105-12-334 Text en Copyright ©2011 Toofanny et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Toofanny, Rudesh D
Simms, Andrew M
Beck, David AC
Daggett, Valerie
Implementation of 3D spatial indexing and compression in a large-scale molecular dynamics simulation database for rapid atomic contact detection
title Implementation of 3D spatial indexing and compression in a large-scale molecular dynamics simulation database for rapid atomic contact detection
title_full Implementation of 3D spatial indexing and compression in a large-scale molecular dynamics simulation database for rapid atomic contact detection
title_fullStr Implementation of 3D spatial indexing and compression in a large-scale molecular dynamics simulation database for rapid atomic contact detection
title_full_unstemmed Implementation of 3D spatial indexing and compression in a large-scale molecular dynamics simulation database for rapid atomic contact detection
title_short Implementation of 3D spatial indexing and compression in a large-scale molecular dynamics simulation database for rapid atomic contact detection
title_sort implementation of 3d spatial indexing and compression in a large-scale molecular dynamics simulation database for rapid atomic contact detection
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3166946/
https://www.ncbi.nlm.nih.gov/pubmed/21831299
http://dx.doi.org/10.1186/1471-2105-12-334
work_keys_str_mv AT toofannyrudeshd implementationof3dspatialindexingandcompressioninalargescalemoleculardynamicssimulationdatabaseforrapidatomiccontactdetection
AT simmsandrewm implementationof3dspatialindexingandcompressioninalargescalemoleculardynamicssimulationdatabaseforrapidatomiccontactdetection
AT beckdavidac implementationof3dspatialindexingandcompressioninalargescalemoleculardynamicssimulationdatabaseforrapidatomiccontactdetection
AT daggettvalerie implementationof3dspatialindexingandcompressioninalargescalemoleculardynamicssimulationdatabaseforrapidatomiccontactdetection