Cargando…

Quantifying Overfitting Potential in Drug Binding Datasets

In this paper, we investigate potential biases in datasets used to make drug binding predictions using machine learning. We investigate a recently published metric called the Asymmetric Validation Embedding (AVE) bias which is used to quantify this bias and detect overfitting. We compare it to a sli...

Descripción completa

Detalles Bibliográficos
Autores principales: Davis, Brian, Mcloughlin, Kevin, Allen, Jonathan, Ellingson, Sally R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304006/
http://dx.doi.org/10.1007/978-3-030-50420-5_44
_version_ 1783548178129223680
author Davis, Brian
Mcloughlin, Kevin
Allen, Jonathan
Ellingson, Sally R.
author_facet Davis, Brian
Mcloughlin, Kevin
Allen, Jonathan
Ellingson, Sally R.
author_sort Davis, Brian
collection PubMed
description In this paper, we investigate potential biases in datasets used to make drug binding predictions using machine learning. We investigate a recently published metric called the Asymmetric Validation Embedding (AVE) bias which is used to quantify this bias and detect overfitting. We compare it to a slightly revised version and introduce a new weighted metric. We find that the new metrics allow to quantify overfitting while not overly limiting training data and produce models with greater predictive value.
format Online
Article
Text
id pubmed-7304006
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73040062020-06-19 Quantifying Overfitting Potential in Drug Binding Datasets Davis, Brian Mcloughlin, Kevin Allen, Jonathan Ellingson, Sally R. Computational Science – ICCS 2020 Article In this paper, we investigate potential biases in datasets used to make drug binding predictions using machine learning. We investigate a recently published metric called the Asymmetric Validation Embedding (AVE) bias which is used to quantify this bias and detect overfitting. We compare it to a slightly revised version and introduce a new weighted metric. We find that the new metrics allow to quantify overfitting while not overly limiting training data and produce models with greater predictive value. 2020-05-22 /pmc/articles/PMC7304006/ http://dx.doi.org/10.1007/978-3-030-50420-5_44 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Davis, Brian
Mcloughlin, Kevin
Allen, Jonathan
Ellingson, Sally R.
Quantifying Overfitting Potential in Drug Binding Datasets
title Quantifying Overfitting Potential in Drug Binding Datasets
title_full Quantifying Overfitting Potential in Drug Binding Datasets
title_fullStr Quantifying Overfitting Potential in Drug Binding Datasets
title_full_unstemmed Quantifying Overfitting Potential in Drug Binding Datasets
title_short Quantifying Overfitting Potential in Drug Binding Datasets
title_sort quantifying overfitting potential in drug binding datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304006/
http://dx.doi.org/10.1007/978-3-030-50420-5_44
work_keys_str_mv AT davisbrian quantifyingoverfittingpotentialindrugbindingdatasets
AT mcloughlinkevin quantifyingoverfittingpotentialindrugbindingdatasets
AT allenjonathan quantifyingoverfittingpotentialindrugbindingdatasets
AT ellingsonsallyr quantifyingoverfittingpotentialindrugbindingdatasets