Cargando…
Preventing dataset shift from breaking machine-learning biomarkers
Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Su...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8478611/ https://www.ncbi.nlm.nih.gov/pubmed/34585237 http://dx.doi.org/10.1093/gigascience/giab055 |
_version_ | 1784576099527688192 |
---|---|
author | Dockès, Jérôme Varoquaux, Gaël Poline, Jean-Baptiste |
author_facet | Dockès, Jérôme Varoquaux, Gaël Poline, Jean-Baptiste |
author_sort | Dockès, Jérôme |
collection | PubMed |
description | Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g., because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts break machine-learning–extracted biomarkers, as well as detection and correction strategies. |
format | Online Article Text |
id | pubmed-8478611 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-84786112021-09-29 Preventing dataset shift from breaking machine-learning biomarkers Dockès, Jérôme Varoquaux, Gaël Poline, Jean-Baptiste Gigascience Review Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g., because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts break machine-learning–extracted biomarkers, as well as detection and correction strategies. Oxford University Press 2021-09-28 /pmc/articles/PMC8478611/ /pubmed/34585237 http://dx.doi.org/10.1093/gigascience/giab055 Text en © The Author(s) 2021. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Review Dockès, Jérôme Varoquaux, Gaël Poline, Jean-Baptiste Preventing dataset shift from breaking machine-learning biomarkers |
title | Preventing dataset shift from breaking machine-learning biomarkers |
title_full | Preventing dataset shift from breaking machine-learning biomarkers |
title_fullStr | Preventing dataset shift from breaking machine-learning biomarkers |
title_full_unstemmed | Preventing dataset shift from breaking machine-learning biomarkers |
title_short | Preventing dataset shift from breaking machine-learning biomarkers |
title_sort | preventing dataset shift from breaking machine-learning biomarkers |
topic | Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8478611/ https://www.ncbi.nlm.nih.gov/pubmed/34585237 http://dx.doi.org/10.1093/gigascience/giab055 |
work_keys_str_mv | AT dockesjerome preventingdatasetshiftfrombreakingmachinelearningbiomarkers AT varoquauxgael preventingdatasetshiftfrombreakingmachinelearningbiomarkers AT polinejeanbaptiste preventingdatasetshiftfrombreakingmachinelearningbiomarkers |