Cargando…

Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

BACKGROUND: In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as w...

Descripción completa

Detalles Bibliográficos
Autores principales: Rahnenführer, Jörg, De Bin, Riccardo, Benner, Axel, Ambrogi, Federico, Lusa, Lara, Boulesteix, Anne-Laure, Migliavacca, Eugenia, Binder, Harald, Michiels, Stefan, Sauerbrei, Willi, McShane, Lisa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10186672/
https://www.ncbi.nlm.nih.gov/pubmed/37189125
http://dx.doi.org/10.1186/s12916-023-02858-y
_version_ 1785042607485747200
author Rahnenführer, Jörg
De Bin, Riccardo
Benner, Axel
Ambrogi, Federico
Lusa, Lara
Boulesteix, Anne-Laure
Migliavacca, Eugenia
Binder, Harald
Michiels, Stefan
Sauerbrei, Willi
McShane, Lisa
author_facet Rahnenführer, Jörg
De Bin, Riccardo
Benner, Axel
Ambrogi, Federico
Lusa, Lara
Boulesteix, Anne-Laure
Migliavacca, Eugenia
Binder, Harald
Michiels, Stefan
Sauerbrei, Willi
McShane, Lisa
author_sort Rahnenführer, Jörg
collection PubMed
description BACKGROUND: In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. METHODS: Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 “High-dimensional data” of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. RESULTS: The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. CONCLUSIONS: This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses.
format Online
Article
Text
id pubmed-10186672
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-101866722023-05-17 Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges Rahnenführer, Jörg De Bin, Riccardo Benner, Axel Ambrogi, Federico Lusa, Lara Boulesteix, Anne-Laure Migliavacca, Eugenia Binder, Harald Michiels, Stefan Sauerbrei, Willi McShane, Lisa BMC Med Guideline BACKGROUND: In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. METHODS: Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 “High-dimensional data” of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. RESULTS: The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. CONCLUSIONS: This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses. BioMed Central 2023-05-15 /pmc/articles/PMC10186672/ /pubmed/37189125 http://dx.doi.org/10.1186/s12916-023-02858-y Text en © This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Guideline
Rahnenführer, Jörg
De Bin, Riccardo
Benner, Axel
Ambrogi, Federico
Lusa, Lara
Boulesteix, Anne-Laure
Migliavacca, Eugenia
Binder, Harald
Michiels, Stefan
Sauerbrei, Willi
McShane, Lisa
Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges
title Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges
title_full Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges
title_fullStr Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges
title_full_unstemmed Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges
title_short Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges
title_sort statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges
topic Guideline
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10186672/
https://www.ncbi.nlm.nih.gov/pubmed/37189125
http://dx.doi.org/10.1186/s12916-023-02858-y
work_keys_str_mv AT rahnenfuhrerjorg statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges
AT debinriccardo statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges
AT benneraxel statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges
AT ambrogifederico statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges
AT lusalara statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges
AT boulesteixannelaure statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges
AT migliavaccaeugenia statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges
AT binderharald statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges
AT michielsstefan statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges
AT sauerbreiwilli statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges
AT mcshanelisa statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges
AT statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges