Cargando…
Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges
BACKGROUND: In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as w...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10186672/ https://www.ncbi.nlm.nih.gov/pubmed/37189125 http://dx.doi.org/10.1186/s12916-023-02858-y |
_version_ | 1785042607485747200 |
---|---|
author | Rahnenführer, Jörg De Bin, Riccardo Benner, Axel Ambrogi, Federico Lusa, Lara Boulesteix, Anne-Laure Migliavacca, Eugenia Binder, Harald Michiels, Stefan Sauerbrei, Willi McShane, Lisa |
author_facet | Rahnenführer, Jörg De Bin, Riccardo Benner, Axel Ambrogi, Federico Lusa, Lara Boulesteix, Anne-Laure Migliavacca, Eugenia Binder, Harald Michiels, Stefan Sauerbrei, Willi McShane, Lisa |
author_sort | Rahnenführer, Jörg |
collection | PubMed |
description | BACKGROUND: In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. METHODS: Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 “High-dimensional data” of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. RESULTS: The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. CONCLUSIONS: This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses. |
format | Online Article Text |
id | pubmed-10186672 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-101866722023-05-17 Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges Rahnenführer, Jörg De Bin, Riccardo Benner, Axel Ambrogi, Federico Lusa, Lara Boulesteix, Anne-Laure Migliavacca, Eugenia Binder, Harald Michiels, Stefan Sauerbrei, Willi McShane, Lisa BMC Med Guideline BACKGROUND: In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. METHODS: Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 “High-dimensional data” of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. RESULTS: The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. CONCLUSIONS: This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses. BioMed Central 2023-05-15 /pmc/articles/PMC10186672/ /pubmed/37189125 http://dx.doi.org/10.1186/s12916-023-02858-y Text en © This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Guideline Rahnenführer, Jörg De Bin, Riccardo Benner, Axel Ambrogi, Federico Lusa, Lara Boulesteix, Anne-Laure Migliavacca, Eugenia Binder, Harald Michiels, Stefan Sauerbrei, Willi McShane, Lisa Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges |
title | Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges |
title_full | Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges |
title_fullStr | Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges |
title_full_unstemmed | Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges |
title_short | Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges |
title_sort | statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges |
topic | Guideline |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10186672/ https://www.ncbi.nlm.nih.gov/pubmed/37189125 http://dx.doi.org/10.1186/s12916-023-02858-y |
work_keys_str_mv | AT rahnenfuhrerjorg statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges AT debinriccardo statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges AT benneraxel statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges AT ambrogifederico statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges AT lusalara statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges AT boulesteixannelaure statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges AT migliavaccaeugenia statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges AT binderharald statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges AT michielsstefan statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges AT sauerbreiwilli statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges AT mcshanelisa statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges AT statisticalanalysisofhighdimensionalbiomedicaldataagentleintroductiontoanalyticalgoalscommonapproachesandchallenges |