Cargando…

Matrix and analysis metadata standards (MAMS) to facilitate harmonization and reproducibility of single-cell data

A large number of genomic and imaging datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While much effort has been devoted to capturing information related to biospecimen information and experimental procedures, the metadata sta...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yichen, Sarfraz, Irzam, Teh, Wei Kheng, Sokolov, Artem, Herb, Brian R., Creasy, Heather H., Virshup, Isaac, Dries, Ruben, Degatano, Kylee, Mahurkar, Anup, Schnell, Daniel J, Madrigal, Pedro, Hilton, Jason, Gehlenborg, Nils, Tickle, Timothy, Campbell, Joshua D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10028847/
https://www.ncbi.nlm.nih.gov/pubmed/36945543
http://dx.doi.org/10.1101/2023.03.06.531314
_version_ 1784910030888239104
author Wang, Yichen
Sarfraz, Irzam
Teh, Wei Kheng
Sokolov, Artem
Herb, Brian R.
Creasy, Heather H.
Virshup, Isaac
Dries, Ruben
Degatano, Kylee
Mahurkar, Anup
Schnell, Daniel J
Madrigal, Pedro
Hilton, Jason
Gehlenborg, Nils
Tickle, Timothy
Campbell, Joshua D.
author_facet Wang, Yichen
Sarfraz, Irzam
Teh, Wei Kheng
Sokolov, Artem
Herb, Brian R.
Creasy, Heather H.
Virshup, Isaac
Dries, Ruben
Degatano, Kylee
Mahurkar, Anup
Schnell, Daniel J
Madrigal, Pedro
Hilton, Jason
Gehlenborg, Nils
Tickle, Timothy
Campbell, Joshua D.
author_sort Wang, Yichen
collection PubMed
description A large number of genomic and imaging datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While much effort has been devoted to capturing information related to biospecimen information and experimental procedures, the metadata standards that describe data matrices and the analysis workflows that produced them are relatively lacking. Detailed metadata schema related to data analysis are needed to facilitate sharing and interoperability across groups and to promote data provenance for reproducibility. To address this need, we developed the Matrix and Analysis Metadata Standards (MAMS) to serve as a resource for data coordinating centers and tool developers. We first curated several simple and complex “use cases” to characterize the types of feature-observation matrices (FOMs), annotations, and analysis metadata produced in different workflows. Based on these use cases, metadata fields were defined to describe the data contained within each matrix including those related to processing, modality, and subsets. Suggested terms were created for the majority of fields to aid in harmonization of metadata terms across groups. Additional provenance metadata fields were also defined to describe the software and workflows that produced each FOM. Finally, we developed a simple list-like schema that can be used to store MAMS information and implemented in multiple formats. Overall, MAMS can be used as a guide to harmonize analysis-related metadata which will ultimately facilitate integration of datasets across tools and consortia. MAMS specifications, use cases, and examples can be found at https://github.com/single-cell-mams/mams/.
format Online
Article
Text
id pubmed-10028847
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-100288472023-03-22 Matrix and analysis metadata standards (MAMS) to facilitate harmonization and reproducibility of single-cell data Wang, Yichen Sarfraz, Irzam Teh, Wei Kheng Sokolov, Artem Herb, Brian R. Creasy, Heather H. Virshup, Isaac Dries, Ruben Degatano, Kylee Mahurkar, Anup Schnell, Daniel J Madrigal, Pedro Hilton, Jason Gehlenborg, Nils Tickle, Timothy Campbell, Joshua D. bioRxiv Article A large number of genomic and imaging datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While much effort has been devoted to capturing information related to biospecimen information and experimental procedures, the metadata standards that describe data matrices and the analysis workflows that produced them are relatively lacking. Detailed metadata schema related to data analysis are needed to facilitate sharing and interoperability across groups and to promote data provenance for reproducibility. To address this need, we developed the Matrix and Analysis Metadata Standards (MAMS) to serve as a resource for data coordinating centers and tool developers. We first curated several simple and complex “use cases” to characterize the types of feature-observation matrices (FOMs), annotations, and analysis metadata produced in different workflows. Based on these use cases, metadata fields were defined to describe the data contained within each matrix including those related to processing, modality, and subsets. Suggested terms were created for the majority of fields to aid in harmonization of metadata terms across groups. Additional provenance metadata fields were also defined to describe the software and workflows that produced each FOM. Finally, we developed a simple list-like schema that can be used to store MAMS information and implemented in multiple formats. Overall, MAMS can be used as a guide to harmonize analysis-related metadata which will ultimately facilitate integration of datasets across tools and consortia. MAMS specifications, use cases, and examples can be found at https://github.com/single-cell-mams/mams/. Cold Spring Harbor Laboratory 2023-03-07 /pmc/articles/PMC10028847/ /pubmed/36945543 http://dx.doi.org/10.1101/2023.03.06.531314 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Wang, Yichen
Sarfraz, Irzam
Teh, Wei Kheng
Sokolov, Artem
Herb, Brian R.
Creasy, Heather H.
Virshup, Isaac
Dries, Ruben
Degatano, Kylee
Mahurkar, Anup
Schnell, Daniel J
Madrigal, Pedro
Hilton, Jason
Gehlenborg, Nils
Tickle, Timothy
Campbell, Joshua D.
Matrix and analysis metadata standards (MAMS) to facilitate harmonization and reproducibility of single-cell data
title Matrix and analysis metadata standards (MAMS) to facilitate harmonization and reproducibility of single-cell data
title_full Matrix and analysis metadata standards (MAMS) to facilitate harmonization and reproducibility of single-cell data
title_fullStr Matrix and analysis metadata standards (MAMS) to facilitate harmonization and reproducibility of single-cell data
title_full_unstemmed Matrix and analysis metadata standards (MAMS) to facilitate harmonization and reproducibility of single-cell data
title_short Matrix and analysis metadata standards (MAMS) to facilitate harmonization and reproducibility of single-cell data
title_sort matrix and analysis metadata standards (mams) to facilitate harmonization and reproducibility of single-cell data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10028847/
https://www.ncbi.nlm.nih.gov/pubmed/36945543
http://dx.doi.org/10.1101/2023.03.06.531314
work_keys_str_mv AT wangyichen matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT sarfrazirzam matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT tehweikheng matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT sokolovartem matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT herbbrianr matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT creasyheatherh matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT virshupisaac matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT driesruben matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT degatanokylee matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT mahurkaranup matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT schnelldanielj matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT madrigalpedro matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT hiltonjason matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT gehlenborgnils matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT tickletimothy matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata
AT campbelljoshuad matrixandanalysismetadatastandardsmamstofacilitateharmonizationandreproducibilityofsinglecelldata