Cargando…

The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices

MOTIVATION: Facilitated by technological improvements, pharmacologic and genetic perturbational datasets have grown in recent years to include millions of experiments. Sharing and publicly distributing these diverse data creates many opportunities for discovery, but in recent years the unprecedented...

Descripción completa

Detalles Bibliográficos
Autores principales: Enache, Oana M, Lahr, David L, Natoli, Ted E, Litichevskiy, Lev, Wadden, David, Flynn, Corey, Gould, Joshua, Asiedu, Jacob K, Narayan, Rajiv, Subramanian, Aravind
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6477971/
https://www.ncbi.nlm.nih.gov/pubmed/30203022
http://dx.doi.org/10.1093/bioinformatics/bty784
_version_ 1783413113746358272
author Enache, Oana M
Lahr, David L
Natoli, Ted E
Litichevskiy, Lev
Wadden, David
Flynn, Corey
Gould, Joshua
Asiedu, Jacob K
Narayan, Rajiv
Subramanian, Aravind
author_facet Enache, Oana M
Lahr, David L
Natoli, Ted E
Litichevskiy, Lev
Wadden, David
Flynn, Corey
Gould, Joshua
Asiedu, Jacob K
Narayan, Rajiv
Subramanian, Aravind
author_sort Enache, Oana M
collection PubMed
description MOTIVATION: Facilitated by technological improvements, pharmacologic and genetic perturbational datasets have grown in recent years to include millions of experiments. Sharing and publicly distributing these diverse data creates many opportunities for discovery, but in recent years the unprecedented size of data generated and its complex associated metadata have also created data storage and integration challenges. RESULTS: We present the GCTx file format and a suite of open-source packages for the efficient storage, serialization and analysis of dense two-dimensional matrices. We have extensively used the format in the Connectivity Map to assemble and share massive datasets currently comprising 1.3 million experiments, and we anticipate that the format’s generalizability, paired with code libraries that we provide, will lower barriers for integrated cross-assay analysis and algorithm development. AVAILABILITY AND IMPLEMENTATION: Software packages (available in Python, R, Matlab and Java) are freely available at https://github.com/cmap. Additional instructions, tutorials and datasets are available at clue.io/code. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6477971
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64779712019-04-25 The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices Enache, Oana M Lahr, David L Natoli, Ted E Litichevskiy, Lev Wadden, David Flynn, Corey Gould, Joshua Asiedu, Jacob K Narayan, Rajiv Subramanian, Aravind Bioinformatics Applications Notes MOTIVATION: Facilitated by technological improvements, pharmacologic and genetic perturbational datasets have grown in recent years to include millions of experiments. Sharing and publicly distributing these diverse data creates many opportunities for discovery, but in recent years the unprecedented size of data generated and its complex associated metadata have also created data storage and integration challenges. RESULTS: We present the GCTx file format and a suite of open-source packages for the efficient storage, serialization and analysis of dense two-dimensional matrices. We have extensively used the format in the Connectivity Map to assemble and share massive datasets currently comprising 1.3 million experiments, and we anticipate that the format’s generalizability, paired with code libraries that we provide, will lower barriers for integrated cross-assay analysis and algorithm development. AVAILABILITY AND IMPLEMENTATION: Software packages (available in Python, R, Matlab and Java) are freely available at https://github.com/cmap. Additional instructions, tutorials and datasets are available at clue.io/code. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-04-15 2018-09-10 /pmc/articles/PMC6477971/ /pubmed/30203022 http://dx.doi.org/10.1093/bioinformatics/bty784 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Enache, Oana M
Lahr, David L
Natoli, Ted E
Litichevskiy, Lev
Wadden, David
Flynn, Corey
Gould, Joshua
Asiedu, Jacob K
Narayan, Rajiv
Subramanian, Aravind
The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices
title The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices
title_full The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices
title_fullStr The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices
title_full_unstemmed The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices
title_short The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices
title_sort gctx format and cmap{py, r, m, j} packages: resources for optimized storage and integrated traversal of annotated dense matrices
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6477971/
https://www.ncbi.nlm.nih.gov/pubmed/30203022
http://dx.doi.org/10.1093/bioinformatics/bty784
work_keys_str_mv AT enacheoanam thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT lahrdavidl thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT natolitede thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT litichevskiylev thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT waddendavid thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT flynncorey thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT gouldjoshua thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT asiedujacobk thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT narayanrajiv thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT subramanianaravind thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT enacheoanam gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT lahrdavidl gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT natolitede gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT litichevskiylev gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT waddendavid gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT flynncorey gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT gouldjoshua gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT asiedujacobk gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT narayanrajiv gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices
AT subramanianaravind gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices