Cargando…

Preventing dataset shift from breaking machine-learning biomarkers

Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Su...

Descripción completa

Detalles Bibliográficos
Autores principales: Dockès, Jérôme, Varoquaux, Gaël, Poline, Jean-Baptiste
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8478611/
https://www.ncbi.nlm.nih.gov/pubmed/34585237
http://dx.doi.org/10.1093/gigascience/giab055
_version_ 1784576099527688192
author Dockès, Jérôme
Varoquaux, Gaël
Poline, Jean-Baptiste
author_facet Dockès, Jérôme
Varoquaux, Gaël
Poline, Jean-Baptiste
author_sort Dockès, Jérôme
collection PubMed
description Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g.,  because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts break machine-learning–extracted biomarkers, as well as detection and correction strategies.
format Online
Article
Text
id pubmed-8478611
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-84786112021-09-29 Preventing dataset shift from breaking machine-learning biomarkers Dockès, Jérôme Varoquaux, Gaël Poline, Jean-Baptiste Gigascience Review Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g.,  because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts break machine-learning–extracted biomarkers, as well as detection and correction strategies. Oxford University Press 2021-09-28 /pmc/articles/PMC8478611/ /pubmed/34585237 http://dx.doi.org/10.1093/gigascience/giab055 Text en © The Author(s) 2021. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review
Dockès, Jérôme
Varoquaux, Gaël
Poline, Jean-Baptiste
Preventing dataset shift from breaking machine-learning biomarkers
title Preventing dataset shift from breaking machine-learning biomarkers
title_full Preventing dataset shift from breaking machine-learning biomarkers
title_fullStr Preventing dataset shift from breaking machine-learning biomarkers
title_full_unstemmed Preventing dataset shift from breaking machine-learning biomarkers
title_short Preventing dataset shift from breaking machine-learning biomarkers
title_sort preventing dataset shift from breaking machine-learning biomarkers
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8478611/
https://www.ncbi.nlm.nih.gov/pubmed/34585237
http://dx.doi.org/10.1093/gigascience/giab055
work_keys_str_mv AT dockesjerome preventingdatasetshiftfrombreakingmachinelearningbiomarkers
AT varoquauxgael preventingdatasetshiftfrombreakingmachinelearningbiomarkers
AT polinejeanbaptiste preventingdatasetshiftfrombreakingmachinelearningbiomarkers