Cargando…

Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community

As buzzwords like “big data,” “machine learning,” and “high-throughput” expand through chemistry, chemists need to consider more than ever their data storage, data management, and data accessibility, whether in their own laboratories or with the broader community. While it is commonplace for chemist...

Descripción completa

Detalles Bibliográficos
Autores principales: Duke, Rebekah, Bhat, Vinayak, Risko, Chad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society of Chemistry 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710231/
https://www.ncbi.nlm.nih.gov/pubmed/36544717
http://dx.doi.org/10.1039/d2sc05142g
_version_ 1784841323198545920
author Duke, Rebekah
Bhat, Vinayak
Risko, Chad
author_facet Duke, Rebekah
Bhat, Vinayak
Risko, Chad
author_sort Duke, Rebekah
collection PubMed
description As buzzwords like “big data,” “machine learning,” and “high-throughput” expand through chemistry, chemists need to consider more than ever their data storage, data management, and data accessibility, whether in their own laboratories or with the broader community. While it is commonplace for chemists to use spreadsheets for data storage and analysis, a move towards database architectures ensures that the data can be more readily findable, accessible, interoperable, and reusable (FAIR). However, making this move has several challenges for those with limited-to-no knowledge of computer programming and databases. This Perspective presents basics of data management using databases with a focus on chemical data. We overview database fundamentals by exploring benefits of database use, introducing terminology, and establishing database design principles. We then detail the extract, transform, and load process for database construction, which includes an overview of data parsing and database architectures, spanning Standard Query Language (SQL) and No-SQL structures. We close by cataloging overarching challenges in database design. This Perspective is accompanied by an interactive demonstration available at https://github.com/D3TaLES/databases_demo. We do all of this within the context of chemical data with the aim of equipping chemists with the knowledge and skills to store, manage, and share their data while abiding by FAIR principles.
format Online
Article
Text
id pubmed-9710231
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher The Royal Society of Chemistry
record_format MEDLINE/PubMed
spelling pubmed-97102312022-12-20 Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community Duke, Rebekah Bhat, Vinayak Risko, Chad Chem Sci Chemistry As buzzwords like “big data,” “machine learning,” and “high-throughput” expand through chemistry, chemists need to consider more than ever their data storage, data management, and data accessibility, whether in their own laboratories or with the broader community. While it is commonplace for chemists to use spreadsheets for data storage and analysis, a move towards database architectures ensures that the data can be more readily findable, accessible, interoperable, and reusable (FAIR). However, making this move has several challenges for those with limited-to-no knowledge of computer programming and databases. This Perspective presents basics of data management using databases with a focus on chemical data. We overview database fundamentals by exploring benefits of database use, introducing terminology, and establishing database design principles. We then detail the extract, transform, and load process for database construction, which includes an overview of data parsing and database architectures, spanning Standard Query Language (SQL) and No-SQL structures. We close by cataloging overarching challenges in database design. This Perspective is accompanied by an interactive demonstration available at https://github.com/D3TaLES/databases_demo. We do all of this within the context of chemical data with the aim of equipping chemists with the knowledge and skills to store, manage, and share their data while abiding by FAIR principles. The Royal Society of Chemistry 2022-11-08 /pmc/articles/PMC9710231/ /pubmed/36544717 http://dx.doi.org/10.1039/d2sc05142g Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by/3.0/
spellingShingle Chemistry
Duke, Rebekah
Bhat, Vinayak
Risko, Chad
Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community
title Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community
title_full Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community
title_fullStr Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community
title_full_unstemmed Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community
title_short Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community
title_sort data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710231/
https://www.ncbi.nlm.nih.gov/pubmed/36544717
http://dx.doi.org/10.1039/d2sc05142g
work_keys_str_mv AT dukerebekah datastoragearchitecturestoacceleratechemicaldiscoverydataaccessibilityforindividuallaboratoriesandthecommunity
AT bhatvinayak datastoragearchitecturestoacceleratechemicaldiscoverydataaccessibilityforindividuallaboratoriesandthecommunity
AT riskochad datastoragearchitecturestoacceleratechemicaldiscoverydataaccessibilityforindividuallaboratoriesandthecommunity