Cargando…

BinaryCIF and CIFTools—Lightweight, efficient and extensible macromolecular data management

3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working w...

Descripción completa

Detalles Bibliográficos
Autores principales: Sehnal, David, Bittrich, Sebastian, Velankar, Sameer, Koča, Jaroslav, Svobodová, Radka, Burley, Stephen K., Rose, Alexander S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7595629/
https://www.ncbi.nlm.nih.gov/pubmed/33075050
http://dx.doi.org/10.1371/journal.pcbi.1008247
_version_ 1783601919446482944
author Sehnal, David
Bittrich, Sebastian
Velankar, Sameer
Koča, Jaroslav
Svobodová, Radka
Burley, Stephen K.
Rose, Alexander S.
author_facet Sehnal, David
Bittrich, Sebastian
Velankar, Sameer
Koča, Jaroslav
Svobodová, Radka
Burley, Stephen K.
Rose, Alexander S.
author_sort Sehnal, David
collection PubMed
description 3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression—factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.
format Online
Article
Text
id pubmed-7595629
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-75956292020-11-03 BinaryCIF and CIFTools—Lightweight, efficient and extensible macromolecular data management Sehnal, David Bittrich, Sebastian Velankar, Sameer Koča, Jaroslav Svobodová, Radka Burley, Stephen K. Rose, Alexander S. PLoS Comput Biol Research Article 3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression—factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data. Public Library of Science 2020-10-19 /pmc/articles/PMC7595629/ /pubmed/33075050 http://dx.doi.org/10.1371/journal.pcbi.1008247 Text en © 2020 Sehnal et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Sehnal, David
Bittrich, Sebastian
Velankar, Sameer
Koča, Jaroslav
Svobodová, Radka
Burley, Stephen K.
Rose, Alexander S.
BinaryCIF and CIFTools—Lightweight, efficient and extensible macromolecular data management
title BinaryCIF and CIFTools—Lightweight, efficient and extensible macromolecular data management
title_full BinaryCIF and CIFTools—Lightweight, efficient and extensible macromolecular data management
title_fullStr BinaryCIF and CIFTools—Lightweight, efficient and extensible macromolecular data management
title_full_unstemmed BinaryCIF and CIFTools—Lightweight, efficient and extensible macromolecular data management
title_short BinaryCIF and CIFTools—Lightweight, efficient and extensible macromolecular data management
title_sort binarycif and ciftools—lightweight, efficient and extensible macromolecular data management
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7595629/
https://www.ncbi.nlm.nih.gov/pubmed/33075050
http://dx.doi.org/10.1371/journal.pcbi.1008247
work_keys_str_mv AT sehnaldavid binarycifandciftoolslightweightefficientandextensiblemacromoleculardatamanagement
AT bittrichsebastian binarycifandciftoolslightweightefficientandextensiblemacromoleculardatamanagement
AT velankarsameer binarycifandciftoolslightweightefficientandextensiblemacromoleculardatamanagement
AT kocajaroslav binarycifandciftoolslightweightefficientandextensiblemacromoleculardatamanagement
AT svobodovaradka binarycifandciftoolslightweightefficientandextensiblemacromoleculardatamanagement
AT burleystephenk binarycifandciftoolslightweightefficientandextensiblemacromoleculardatamanagement
AT rosealexanders binarycifandciftoolslightweightefficientandextensiblemacromoleculardatamanagement