Cargando…

Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data

Emerging technologies now allow for mass spectrometry-based profiling of thousands of small molecule metabolites (‘metabolomics’) in an increasing number of biosamples. While offering great promise for insight into the pathogenesis of human disease, standard approaches have not yet been established...

Descripción completa

Detalles Bibliográficos
Autores principales: Henglin, Mir, Claggett, Brian L., Antonelli, Joseph, Alotaibi, Mona, Magalang, Gino Alberto, Watrous, Jeramie D., Lagerborg, Kim A., Ovsak, Gavin, Musso, Gabriel, Demler, Olga V., Vasan, Ramachandran S., Larson, Martin G., Jain, Mohit, Cheng, Susan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9227835/
https://www.ncbi.nlm.nih.gov/pubmed/35736452
http://dx.doi.org/10.3390/metabo12060519
_version_ 1784734281215508480
author Henglin, Mir
Claggett, Brian L.
Antonelli, Joseph
Alotaibi, Mona
Magalang, Gino Alberto
Watrous, Jeramie D.
Lagerborg, Kim A.
Ovsak, Gavin
Musso, Gabriel
Demler, Olga V.
Vasan, Ramachandran S.
Larson, Martin G.
Jain, Mohit
Cheng, Susan
author_facet Henglin, Mir
Claggett, Brian L.
Antonelli, Joseph
Alotaibi, Mona
Magalang, Gino Alberto
Watrous, Jeramie D.
Lagerborg, Kim A.
Ovsak, Gavin
Musso, Gabriel
Demler, Olga V.
Vasan, Ramachandran S.
Larson, Martin G.
Jain, Mohit
Cheng, Susan
author_sort Henglin, Mir
collection PubMed
description Emerging technologies now allow for mass spectrometry-based profiling of thousands of small molecule metabolites (‘metabolomics’) in an increasing number of biosamples. While offering great promise for insight into the pathogenesis of human disease, standard approaches have not yet been established for statistically analyzing increasingly complex, high-dimensional human metabolomics data in relation to clinical phenotypes, including disease outcomes. To determine optimal approaches for analysis, we formally compare traditional and newer statistical learning methods across a range of metabolomics dataset types. In simulated and experimental metabolomics data derived from large population-based human cohorts, we observe that with an increasing number of study subjects, univariate compared to multivariate methods result in an apparently higher false discovery rate as represented by substantial correlation between metabolites directly associated with the outcome and metabolites not associated with the outcome. Although the higher frequency of such associations would not be considered false in the strict statistical sense, it may be considered biologically less informative. In scenarios wherein the number of assayed metabolites increases, as in measures of nontargeted versus targeted metabolomics, multivariate methods performed especially favorably across a range of statistical operating characteristics. In nontargeted metabolomics datasets that included thousands of metabolite measures, sparse multivariate models demonstrated greater selectivity and lower potential for spurious relationships. When the number of metabolites was similar to or exceeded the number of study subjects, as is common with nontargeted metabolomics analysis of relatively small cohorts, sparse multivariate models exhibited the most-robust statistical power with more consistent results. These findings have important implications for metabolomics analysis in human disease.
format Online
Article
Text
id pubmed-9227835
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-92278352022-06-25 Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data Henglin, Mir Claggett, Brian L. Antonelli, Joseph Alotaibi, Mona Magalang, Gino Alberto Watrous, Jeramie D. Lagerborg, Kim A. Ovsak, Gavin Musso, Gabriel Demler, Olga V. Vasan, Ramachandran S. Larson, Martin G. Jain, Mohit Cheng, Susan Metabolites Article Emerging technologies now allow for mass spectrometry-based profiling of thousands of small molecule metabolites (‘metabolomics’) in an increasing number of biosamples. While offering great promise for insight into the pathogenesis of human disease, standard approaches have not yet been established for statistically analyzing increasingly complex, high-dimensional human metabolomics data in relation to clinical phenotypes, including disease outcomes. To determine optimal approaches for analysis, we formally compare traditional and newer statistical learning methods across a range of metabolomics dataset types. In simulated and experimental metabolomics data derived from large population-based human cohorts, we observe that with an increasing number of study subjects, univariate compared to multivariate methods result in an apparently higher false discovery rate as represented by substantial correlation between metabolites directly associated with the outcome and metabolites not associated with the outcome. Although the higher frequency of such associations would not be considered false in the strict statistical sense, it may be considered biologically less informative. In scenarios wherein the number of assayed metabolites increases, as in measures of nontargeted versus targeted metabolomics, multivariate methods performed especially favorably across a range of statistical operating characteristics. In nontargeted metabolomics datasets that included thousands of metabolite measures, sparse multivariate models demonstrated greater selectivity and lower potential for spurious relationships. When the number of metabolites was similar to or exceeded the number of study subjects, as is common with nontargeted metabolomics analysis of relatively small cohorts, sparse multivariate models exhibited the most-robust statistical power with more consistent results. These findings have important implications for metabolomics analysis in human disease. MDPI 2022-06-04 /pmc/articles/PMC9227835/ /pubmed/35736452 http://dx.doi.org/10.3390/metabo12060519 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Henglin, Mir
Claggett, Brian L.
Antonelli, Joseph
Alotaibi, Mona
Magalang, Gino Alberto
Watrous, Jeramie D.
Lagerborg, Kim A.
Ovsak, Gavin
Musso, Gabriel
Demler, Olga V.
Vasan, Ramachandran S.
Larson, Martin G.
Jain, Mohit
Cheng, Susan
Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data
title Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data
title_full Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data
title_fullStr Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data
title_full_unstemmed Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data
title_short Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data
title_sort quantitative comparison of statistical methods for analyzing human metabolomics data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9227835/
https://www.ncbi.nlm.nih.gov/pubmed/35736452
http://dx.doi.org/10.3390/metabo12060519
work_keys_str_mv AT henglinmir quantitativecomparisonofstatisticalmethodsforanalyzinghumanmetabolomicsdata
AT claggettbrianl quantitativecomparisonofstatisticalmethodsforanalyzinghumanmetabolomicsdata
AT antonellijoseph quantitativecomparisonofstatisticalmethodsforanalyzinghumanmetabolomicsdata
AT alotaibimona quantitativecomparisonofstatisticalmethodsforanalyzinghumanmetabolomicsdata
AT magalangginoalberto quantitativecomparisonofstatisticalmethodsforanalyzinghumanmetabolomicsdata
AT watrousjeramied quantitativecomparisonofstatisticalmethodsforanalyzinghumanmetabolomicsdata
AT lagerborgkima quantitativecomparisonofstatisticalmethodsforanalyzinghumanmetabolomicsdata
AT ovsakgavin quantitativecomparisonofstatisticalmethodsforanalyzinghumanmetabolomicsdata
AT mussogabriel quantitativecomparisonofstatisticalmethodsforanalyzinghumanmetabolomicsdata
AT demlerolgav quantitativecomparisonofstatisticalmethodsforanalyzinghumanmetabolomicsdata
AT vasanramachandrans quantitativecomparisonofstatisticalmethodsforanalyzinghumanmetabolomicsdata
AT larsonmarting quantitativecomparisonofstatisticalmethodsforanalyzinghumanmetabolomicsdata
AT jainmohit quantitativecomparisonofstatisticalmethodsforanalyzinghumanmetabolomicsdata
AT chengsusan quantitativecomparisonofstatisticalmethodsforanalyzinghumanmetabolomicsdata