Cargando…
Minimum standards for evaluating machine-learned models of high-dimensional data
The maturation of machine learning and technologies that generate high dimensional data have led to the growth in the number of predictive models, such as the “epigenetic clock”. While powerful, machine learning algorithms run a high risk of overfitting, particularly when training data is limited, a...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9513121/ https://www.ncbi.nlm.nih.gov/pubmed/36176975 http://dx.doi.org/10.3389/fragi.2022.901841 |
_version_ | 1784797985521008640 |
---|---|
author | Chen, Brian H. |
author_facet | Chen, Brian H. |
author_sort | Chen, Brian H. |
collection | PubMed |
description | The maturation of machine learning and technologies that generate high dimensional data have led to the growth in the number of predictive models, such as the “epigenetic clock”. While powerful, machine learning algorithms run a high risk of overfitting, particularly when training data is limited, as is often the case with high-dimensional data (“large p, small n”). Making independent validation a requirement of “algorithmic biomarker” development would bring greater clarity to the field by more efficiently identifying prediction or classification models to prioritize for further validation and characterization. Reproducibility has been a mainstay in science, but only recently received attention in defining its various aspects and how to apply these principles to machine learning models. The goal of this paper is merely to serve as a call-to-arms for greater rigor and attention paid to newly developed models for prediction or classification. |
format | Online Article Text |
id | pubmed-9513121 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-95131212022-09-28 Minimum standards for evaluating machine-learned models of high-dimensional data Chen, Brian H. Front Aging Aging The maturation of machine learning and technologies that generate high dimensional data have led to the growth in the number of predictive models, such as the “epigenetic clock”. While powerful, machine learning algorithms run a high risk of overfitting, particularly when training data is limited, as is often the case with high-dimensional data (“large p, small n”). Making independent validation a requirement of “algorithmic biomarker” development would bring greater clarity to the field by more efficiently identifying prediction or classification models to prioritize for further validation and characterization. Reproducibility has been a mainstay in science, but only recently received attention in defining its various aspects and how to apply these principles to machine learning models. The goal of this paper is merely to serve as a call-to-arms for greater rigor and attention paid to newly developed models for prediction or classification. Frontiers Media S.A. 2022-09-13 /pmc/articles/PMC9513121/ /pubmed/36176975 http://dx.doi.org/10.3389/fragi.2022.901841 Text en Copyright © 2022 Chen. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Aging Chen, Brian H. Minimum standards for evaluating machine-learned models of high-dimensional data |
title | Minimum standards for evaluating machine-learned models of high-dimensional data |
title_full | Minimum standards for evaluating machine-learned models of high-dimensional data |
title_fullStr | Minimum standards for evaluating machine-learned models of high-dimensional data |
title_full_unstemmed | Minimum standards for evaluating machine-learned models of high-dimensional data |
title_short | Minimum standards for evaluating machine-learned models of high-dimensional data |
title_sort | minimum standards for evaluating machine-learned models of high-dimensional data |
topic | Aging |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9513121/ https://www.ncbi.nlm.nih.gov/pubmed/36176975 http://dx.doi.org/10.3389/fragi.2022.901841 |
work_keys_str_mv | AT chenbrianh minimumstandardsforevaluatingmachinelearnedmodelsofhighdimensionaldata |