Cargando…
Quantifying Overfitting Potential in Drug Binding Datasets
In this paper, we investigate potential biases in datasets used to make drug binding predictions using machine learning. We investigate a recently published metric called the Asymmetric Validation Embedding (AVE) bias which is used to quantify this bias and detect overfitting. We compare it to a sli...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304006/ http://dx.doi.org/10.1007/978-3-030-50420-5_44 |
_version_ | 1783548178129223680 |
---|---|
author | Davis, Brian Mcloughlin, Kevin Allen, Jonathan Ellingson, Sally R. |
author_facet | Davis, Brian Mcloughlin, Kevin Allen, Jonathan Ellingson, Sally R. |
author_sort | Davis, Brian |
collection | PubMed |
description | In this paper, we investigate potential biases in datasets used to make drug binding predictions using machine learning. We investigate a recently published metric called the Asymmetric Validation Embedding (AVE) bias which is used to quantify this bias and detect overfitting. We compare it to a slightly revised version and introduce a new weighted metric. We find that the new metrics allow to quantify overfitting while not overly limiting training data and produce models with greater predictive value. |
format | Online Article Text |
id | pubmed-7304006 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-73040062020-06-19 Quantifying Overfitting Potential in Drug Binding Datasets Davis, Brian Mcloughlin, Kevin Allen, Jonathan Ellingson, Sally R. Computational Science – ICCS 2020 Article In this paper, we investigate potential biases in datasets used to make drug binding predictions using machine learning. We investigate a recently published metric called the Asymmetric Validation Embedding (AVE) bias which is used to quantify this bias and detect overfitting. We compare it to a slightly revised version and introduce a new weighted metric. We find that the new metrics allow to quantify overfitting while not overly limiting training data and produce models with greater predictive value. 2020-05-22 /pmc/articles/PMC7304006/ http://dx.doi.org/10.1007/978-3-030-50420-5_44 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Davis, Brian Mcloughlin, Kevin Allen, Jonathan Ellingson, Sally R. Quantifying Overfitting Potential in Drug Binding Datasets |
title | Quantifying Overfitting Potential in Drug Binding Datasets |
title_full | Quantifying Overfitting Potential in Drug Binding Datasets |
title_fullStr | Quantifying Overfitting Potential in Drug Binding Datasets |
title_full_unstemmed | Quantifying Overfitting Potential in Drug Binding Datasets |
title_short | Quantifying Overfitting Potential in Drug Binding Datasets |
title_sort | quantifying overfitting potential in drug binding datasets |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304006/ http://dx.doi.org/10.1007/978-3-030-50420-5_44 |
work_keys_str_mv | AT davisbrian quantifyingoverfittingpotentialindrugbindingdatasets AT mcloughlinkevin quantifyingoverfittingpotentialindrugbindingdatasets AT allenjonathan quantifyingoverfittingpotentialindrugbindingdatasets AT ellingsonsallyr quantifyingoverfittingpotentialindrugbindingdatasets |