Cargando…

Reproducibility of mass spectrometry based metabolomics data

BACKGROUND: Assessing the reproducibility of measurements is an important first step for improving the reliability of downstream analyses of high-throughput metabolomics experiments. We define a metabolite to be reproducible when it demonstrates consistency across replicate experiments. Similarly, m...

Descripción completa

Detalles Bibliográficos
Autores principales: Ghosh, Tusharkanti, Philtron, Daisy, Zhang, Weiming, Kechris, Katerina, Ghosh, Debashis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8424977/
https://www.ncbi.nlm.nih.gov/pubmed/34493210
http://dx.doi.org/10.1186/s12859-021-04336-9
_version_ 1783749764547870720
author Ghosh, Tusharkanti
Philtron, Daisy
Zhang, Weiming
Kechris, Katerina
Ghosh, Debashis
author_facet Ghosh, Tusharkanti
Philtron, Daisy
Zhang, Weiming
Kechris, Katerina
Ghosh, Debashis
author_sort Ghosh, Tusharkanti
collection PubMed
description BACKGROUND: Assessing the reproducibility of measurements is an important first step for improving the reliability of downstream analyses of high-throughput metabolomics experiments. We define a metabolite to be reproducible when it demonstrates consistency across replicate experiments. Similarly, metabolites which are not consistent across replicates can be labeled as irreproducible. In this work, we introduce and evaluate the use (Ma)ximum (R)ank (R)eproducibility (MaRR) to examine reproducibility in mass spectrometry-based metabolomics experiments. We examine reproducibility across technical or biological samples in three different mass spectrometry metabolomics (MS-Metabolomics) data sets. RESULTS: We apply MaRR, a nonparametric approach that detects the change from reproducible to irreproducible signals using a maximal rank statistic. The advantage of using MaRR over model-based methods that it does not make parametric assumptions on the underlying distributions or dependence structures of reproducible metabolites. Using three MS Metabolomics data sets generated in the multi-center Genetic Epidemiology of Chronic Obstructive Pulmonary Disease (COPD) study, we applied the MaRR procedure after data processing to explore reproducibility across technical or biological samples. Under realistic settings of MS-Metabolomics data, the MaRR procedure effectively controls the False Discovery Rate (FDR) when there was a gradual reduction in correlation between replicate pairs for less highly ranked signals. Simulation studies also show that the MaRR procedure tends to have high power for detecting reproducible metabolites in most situations except for smaller values of proportion of reproducible metabolites. Bias (i.e., the difference between the estimated and the true value of reproducible signal proportions) values for simulations are also close to zero. The results reported from the real data show a higher level of reproducibility for technical replicates compared to biological replicates across all the three different datasets. In summary, we demonstrate that the MaRR procedure application can be adapted to various experimental designs, and that the nonparametric approach performs consistently well. CONCLUSIONS: This research was motivated by reproducibility, which has proven to be a major obstacle in the use of genomic findings to advance clinical practice. In this paper, we developed a data-driven approach to assess the reproducibility of MS-Metabolomics data sets. The methods described in this paper are implemented in the open-source R package marr, which is freely available from Bioconductor at http://bioconductor.org/packages/marr.
format Online
Article
Text
id pubmed-8424977
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-84249772021-09-10 Reproducibility of mass spectrometry based metabolomics data Ghosh, Tusharkanti Philtron, Daisy Zhang, Weiming Kechris, Katerina Ghosh, Debashis BMC Bioinformatics Research BACKGROUND: Assessing the reproducibility of measurements is an important first step for improving the reliability of downstream analyses of high-throughput metabolomics experiments. We define a metabolite to be reproducible when it demonstrates consistency across replicate experiments. Similarly, metabolites which are not consistent across replicates can be labeled as irreproducible. In this work, we introduce and evaluate the use (Ma)ximum (R)ank (R)eproducibility (MaRR) to examine reproducibility in mass spectrometry-based metabolomics experiments. We examine reproducibility across technical or biological samples in three different mass spectrometry metabolomics (MS-Metabolomics) data sets. RESULTS: We apply MaRR, a nonparametric approach that detects the change from reproducible to irreproducible signals using a maximal rank statistic. The advantage of using MaRR over model-based methods that it does not make parametric assumptions on the underlying distributions or dependence structures of reproducible metabolites. Using three MS Metabolomics data sets generated in the multi-center Genetic Epidemiology of Chronic Obstructive Pulmonary Disease (COPD) study, we applied the MaRR procedure after data processing to explore reproducibility across technical or biological samples. Under realistic settings of MS-Metabolomics data, the MaRR procedure effectively controls the False Discovery Rate (FDR) when there was a gradual reduction in correlation between replicate pairs for less highly ranked signals. Simulation studies also show that the MaRR procedure tends to have high power for detecting reproducible metabolites in most situations except for smaller values of proportion of reproducible metabolites. Bias (i.e., the difference between the estimated and the true value of reproducible signal proportions) values for simulations are also close to zero. The results reported from the real data show a higher level of reproducibility for technical replicates compared to biological replicates across all the three different datasets. In summary, we demonstrate that the MaRR procedure application can be adapted to various experimental designs, and that the nonparametric approach performs consistently well. CONCLUSIONS: This research was motivated by reproducibility, which has proven to be a major obstacle in the use of genomic findings to advance clinical practice. In this paper, we developed a data-driven approach to assess the reproducibility of MS-Metabolomics data sets. The methods described in this paper are implemented in the open-source R package marr, which is freely available from Bioconductor at http://bioconductor.org/packages/marr. BioMed Central 2021-09-07 /pmc/articles/PMC8424977/ /pubmed/34493210 http://dx.doi.org/10.1186/s12859-021-04336-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Ghosh, Tusharkanti
Philtron, Daisy
Zhang, Weiming
Kechris, Katerina
Ghosh, Debashis
Reproducibility of mass spectrometry based metabolomics data
title Reproducibility of mass spectrometry based metabolomics data
title_full Reproducibility of mass spectrometry based metabolomics data
title_fullStr Reproducibility of mass spectrometry based metabolomics data
title_full_unstemmed Reproducibility of mass spectrometry based metabolomics data
title_short Reproducibility of mass spectrometry based metabolomics data
title_sort reproducibility of mass spectrometry based metabolomics data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8424977/
https://www.ncbi.nlm.nih.gov/pubmed/34493210
http://dx.doi.org/10.1186/s12859-021-04336-9
work_keys_str_mv AT ghoshtusharkanti reproducibilityofmassspectrometrybasedmetabolomicsdata
AT philtrondaisy reproducibilityofmassspectrometrybasedmetabolomicsdata
AT zhangweiming reproducibilityofmassspectrometrybasedmetabolomicsdata
AT kechriskaterina reproducibilityofmassspectrometrybasedmetabolomicsdata
AT ghoshdebashis reproducibilityofmassspectrometrybasedmetabolomicsdata