Cargando…
The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices
MOTIVATION: Facilitated by technological improvements, pharmacologic and genetic perturbational datasets have grown in recent years to include millions of experiments. Sharing and publicly distributing these diverse data creates many opportunities for discovery, but in recent years the unprecedented...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6477971/ https://www.ncbi.nlm.nih.gov/pubmed/30203022 http://dx.doi.org/10.1093/bioinformatics/bty784 |
_version_ | 1783413113746358272 |
---|---|
author | Enache, Oana M Lahr, David L Natoli, Ted E Litichevskiy, Lev Wadden, David Flynn, Corey Gould, Joshua Asiedu, Jacob K Narayan, Rajiv Subramanian, Aravind |
author_facet | Enache, Oana M Lahr, David L Natoli, Ted E Litichevskiy, Lev Wadden, David Flynn, Corey Gould, Joshua Asiedu, Jacob K Narayan, Rajiv Subramanian, Aravind |
author_sort | Enache, Oana M |
collection | PubMed |
description | MOTIVATION: Facilitated by technological improvements, pharmacologic and genetic perturbational datasets have grown in recent years to include millions of experiments. Sharing and publicly distributing these diverse data creates many opportunities for discovery, but in recent years the unprecedented size of data generated and its complex associated metadata have also created data storage and integration challenges. RESULTS: We present the GCTx file format and a suite of open-source packages for the efficient storage, serialization and analysis of dense two-dimensional matrices. We have extensively used the format in the Connectivity Map to assemble and share massive datasets currently comprising 1.3 million experiments, and we anticipate that the format’s generalizability, paired with code libraries that we provide, will lower barriers for integrated cross-assay analysis and algorithm development. AVAILABILITY AND IMPLEMENTATION: Software packages (available in Python, R, Matlab and Java) are freely available at https://github.com/cmap. Additional instructions, tutorials and datasets are available at clue.io/code. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6477971 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-64779712019-04-25 The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices Enache, Oana M Lahr, David L Natoli, Ted E Litichevskiy, Lev Wadden, David Flynn, Corey Gould, Joshua Asiedu, Jacob K Narayan, Rajiv Subramanian, Aravind Bioinformatics Applications Notes MOTIVATION: Facilitated by technological improvements, pharmacologic and genetic perturbational datasets have grown in recent years to include millions of experiments. Sharing and publicly distributing these diverse data creates many opportunities for discovery, but in recent years the unprecedented size of data generated and its complex associated metadata have also created data storage and integration challenges. RESULTS: We present the GCTx file format and a suite of open-source packages for the efficient storage, serialization and analysis of dense two-dimensional matrices. We have extensively used the format in the Connectivity Map to assemble and share massive datasets currently comprising 1.3 million experiments, and we anticipate that the format’s generalizability, paired with code libraries that we provide, will lower barriers for integrated cross-assay analysis and algorithm development. AVAILABILITY AND IMPLEMENTATION: Software packages (available in Python, R, Matlab and Java) are freely available at https://github.com/cmap. Additional instructions, tutorials and datasets are available at clue.io/code. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-04-15 2018-09-10 /pmc/articles/PMC6477971/ /pubmed/30203022 http://dx.doi.org/10.1093/bioinformatics/bty784 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Notes Enache, Oana M Lahr, David L Natoli, Ted E Litichevskiy, Lev Wadden, David Flynn, Corey Gould, Joshua Asiedu, Jacob K Narayan, Rajiv Subramanian, Aravind The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices |
title | The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices |
title_full | The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices |
title_fullStr | The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices |
title_full_unstemmed | The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices |
title_short | The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices |
title_sort | gctx format and cmap{py, r, m, j} packages: resources for optimized storage and integrated traversal of annotated dense matrices |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6477971/ https://www.ncbi.nlm.nih.gov/pubmed/30203022 http://dx.doi.org/10.1093/bioinformatics/bty784 |
work_keys_str_mv | AT enacheoanam thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT lahrdavidl thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT natolitede thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT litichevskiylev thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT waddendavid thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT flynncorey thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT gouldjoshua thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT asiedujacobk thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT narayanrajiv thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT subramanianaravind thegctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT enacheoanam gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT lahrdavidl gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT natolitede gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT litichevskiylev gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT waddendavid gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT flynncorey gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT gouldjoshua gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT asiedujacobk gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT narayanrajiv gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices AT subramanianaravind gctxformatandcmappyrmjpackagesresourcesforoptimizedstorageandintegratedtraversalofannotateddensematrices |