Cargando…
Reproducibility of mass spectrometry based metabolomics data
BACKGROUND: Assessing the reproducibility of measurements is an important first step for improving the reliability of downstream analyses of high-throughput metabolomics experiments. We define a metabolite to be reproducible when it demonstrates consistency across replicate experiments. Similarly, m...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8424977/ https://www.ncbi.nlm.nih.gov/pubmed/34493210 http://dx.doi.org/10.1186/s12859-021-04336-9 |
_version_ | 1783749764547870720 |
---|---|
author | Ghosh, Tusharkanti Philtron, Daisy Zhang, Weiming Kechris, Katerina Ghosh, Debashis |
author_facet | Ghosh, Tusharkanti Philtron, Daisy Zhang, Weiming Kechris, Katerina Ghosh, Debashis |
author_sort | Ghosh, Tusharkanti |
collection | PubMed |
description | BACKGROUND: Assessing the reproducibility of measurements is an important first step for improving the reliability of downstream analyses of high-throughput metabolomics experiments. We define a metabolite to be reproducible when it demonstrates consistency across replicate experiments. Similarly, metabolites which are not consistent across replicates can be labeled as irreproducible. In this work, we introduce and evaluate the use (Ma)ximum (R)ank (R)eproducibility (MaRR) to examine reproducibility in mass spectrometry-based metabolomics experiments. We examine reproducibility across technical or biological samples in three different mass spectrometry metabolomics (MS-Metabolomics) data sets. RESULTS: We apply MaRR, a nonparametric approach that detects the change from reproducible to irreproducible signals using a maximal rank statistic. The advantage of using MaRR over model-based methods that it does not make parametric assumptions on the underlying distributions or dependence structures of reproducible metabolites. Using three MS Metabolomics data sets generated in the multi-center Genetic Epidemiology of Chronic Obstructive Pulmonary Disease (COPD) study, we applied the MaRR procedure after data processing to explore reproducibility across technical or biological samples. Under realistic settings of MS-Metabolomics data, the MaRR procedure effectively controls the False Discovery Rate (FDR) when there was a gradual reduction in correlation between replicate pairs for less highly ranked signals. Simulation studies also show that the MaRR procedure tends to have high power for detecting reproducible metabolites in most situations except for smaller values of proportion of reproducible metabolites. Bias (i.e., the difference between the estimated and the true value of reproducible signal proportions) values for simulations are also close to zero. The results reported from the real data show a higher level of reproducibility for technical replicates compared to biological replicates across all the three different datasets. In summary, we demonstrate that the MaRR procedure application can be adapted to various experimental designs, and that the nonparametric approach performs consistently well. CONCLUSIONS: This research was motivated by reproducibility, which has proven to be a major obstacle in the use of genomic findings to advance clinical practice. In this paper, we developed a data-driven approach to assess the reproducibility of MS-Metabolomics data sets. The methods described in this paper are implemented in the open-source R package marr, which is freely available from Bioconductor at http://bioconductor.org/packages/marr. |
format | Online Article Text |
id | pubmed-8424977 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-84249772021-09-10 Reproducibility of mass spectrometry based metabolomics data Ghosh, Tusharkanti Philtron, Daisy Zhang, Weiming Kechris, Katerina Ghosh, Debashis BMC Bioinformatics Research BACKGROUND: Assessing the reproducibility of measurements is an important first step for improving the reliability of downstream analyses of high-throughput metabolomics experiments. We define a metabolite to be reproducible when it demonstrates consistency across replicate experiments. Similarly, metabolites which are not consistent across replicates can be labeled as irreproducible. In this work, we introduce and evaluate the use (Ma)ximum (R)ank (R)eproducibility (MaRR) to examine reproducibility in mass spectrometry-based metabolomics experiments. We examine reproducibility across technical or biological samples in three different mass spectrometry metabolomics (MS-Metabolomics) data sets. RESULTS: We apply MaRR, a nonparametric approach that detects the change from reproducible to irreproducible signals using a maximal rank statistic. The advantage of using MaRR over model-based methods that it does not make parametric assumptions on the underlying distributions or dependence structures of reproducible metabolites. Using three MS Metabolomics data sets generated in the multi-center Genetic Epidemiology of Chronic Obstructive Pulmonary Disease (COPD) study, we applied the MaRR procedure after data processing to explore reproducibility across technical or biological samples. Under realistic settings of MS-Metabolomics data, the MaRR procedure effectively controls the False Discovery Rate (FDR) when there was a gradual reduction in correlation between replicate pairs for less highly ranked signals. Simulation studies also show that the MaRR procedure tends to have high power for detecting reproducible metabolites in most situations except for smaller values of proportion of reproducible metabolites. Bias (i.e., the difference between the estimated and the true value of reproducible signal proportions) values for simulations are also close to zero. The results reported from the real data show a higher level of reproducibility for technical replicates compared to biological replicates across all the three different datasets. In summary, we demonstrate that the MaRR procedure application can be adapted to various experimental designs, and that the nonparametric approach performs consistently well. CONCLUSIONS: This research was motivated by reproducibility, which has proven to be a major obstacle in the use of genomic findings to advance clinical practice. In this paper, we developed a data-driven approach to assess the reproducibility of MS-Metabolomics data sets. The methods described in this paper are implemented in the open-source R package marr, which is freely available from Bioconductor at http://bioconductor.org/packages/marr. BioMed Central 2021-09-07 /pmc/articles/PMC8424977/ /pubmed/34493210 http://dx.doi.org/10.1186/s12859-021-04336-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Ghosh, Tusharkanti Philtron, Daisy Zhang, Weiming Kechris, Katerina Ghosh, Debashis Reproducibility of mass spectrometry based metabolomics data |
title | Reproducibility of mass spectrometry based metabolomics data |
title_full | Reproducibility of mass spectrometry based metabolomics data |
title_fullStr | Reproducibility of mass spectrometry based metabolomics data |
title_full_unstemmed | Reproducibility of mass spectrometry based metabolomics data |
title_short | Reproducibility of mass spectrometry based metabolomics data |
title_sort | reproducibility of mass spectrometry based metabolomics data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8424977/ https://www.ncbi.nlm.nih.gov/pubmed/34493210 http://dx.doi.org/10.1186/s12859-021-04336-9 |
work_keys_str_mv | AT ghoshtusharkanti reproducibilityofmassspectrometrybasedmetabolomicsdata AT philtrondaisy reproducibilityofmassspectrometrybasedmetabolomicsdata AT zhangweiming reproducibilityofmassspectrometrybasedmetabolomicsdata AT kechriskaterina reproducibilityofmassspectrometrybasedmetabolomicsdata AT ghoshdebashis reproducibilityofmassspectrometrybasedmetabolomicsdata |