Cargando…

DCMS: A data analytics and management system for molecular simulation

Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and tem...

Descripción completa

Detalles Bibliográficos
Autores principales: Kumar, Anand, Grupcev, Vladimir, Berrada, Meryem, Fogarty, Joseph C, Tu, Yi-Cheng, Zhu, Xingquan, Pandit, Sagar A, Xia, Yuni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4456345/
https://www.ncbi.nlm.nih.gov/pubmed/26069879
http://dx.doi.org/10.1186/s40537-014-0009-5
_version_ 1782374823170670592
author Kumar, Anand
Grupcev, Vladimir
Berrada, Meryem
Fogarty, Joseph C
Tu, Yi-Cheng
Zhu, Xingquan
Pandit, Sagar A
Xia, Yuni
author_facet Kumar, Anand
Grupcev, Vladimir
Berrada, Meryem
Fogarty, Joseph C
Tu, Yi-Cheng
Zhu, Xingquan
Pandit, Sagar A
Xia, Yuni
author_sort Kumar, Anand
collection PubMed
description Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (i.e., SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression.
format Online
Article
Text
id pubmed-4456345
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-44563452015-06-09 DCMS: A data analytics and management system for molecular simulation Kumar, Anand Grupcev, Vladimir Berrada, Meryem Fogarty, Joseph C Tu, Yi-Cheng Zhu, Xingquan Pandit, Sagar A Xia, Yuni J Big Data Case Study Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (i.e., SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression. Springer International Publishing 2014-11-26 2015 /pmc/articles/PMC4456345/ /pubmed/26069879 http://dx.doi.org/10.1186/s40537-014-0009-5 Text en © Kumar et al.; licensee Springer. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Case Study
Kumar, Anand
Grupcev, Vladimir
Berrada, Meryem
Fogarty, Joseph C
Tu, Yi-Cheng
Zhu, Xingquan
Pandit, Sagar A
Xia, Yuni
DCMS: A data analytics and management system for molecular simulation
title DCMS: A data analytics and management system for molecular simulation
title_full DCMS: A data analytics and management system for molecular simulation
title_fullStr DCMS: A data analytics and management system for molecular simulation
title_full_unstemmed DCMS: A data analytics and management system for molecular simulation
title_short DCMS: A data analytics and management system for molecular simulation
title_sort dcms: a data analytics and management system for molecular simulation
topic Case Study
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4456345/
https://www.ncbi.nlm.nih.gov/pubmed/26069879
http://dx.doi.org/10.1186/s40537-014-0009-5
work_keys_str_mv AT kumaranand dcmsadataanalyticsandmanagementsystemformolecularsimulation
AT grupcevvladimir dcmsadataanalyticsandmanagementsystemformolecularsimulation
AT berradameryem dcmsadataanalyticsandmanagementsystemformolecularsimulation
AT fogartyjosephc dcmsadataanalyticsandmanagementsystemformolecularsimulation
AT tuyicheng dcmsadataanalyticsandmanagementsystemformolecularsimulation
AT zhuxingquan dcmsadataanalyticsandmanagementsystemformolecularsimulation
AT panditsagara dcmsadataanalyticsandmanagementsystemformolecularsimulation
AT xiayuni dcmsadataanalyticsandmanagementsystemformolecularsimulation