Cargando…

COVER: conformational oversampling as data augmentation for molecules

Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity and specificity are needed. In this paper we introduce conformational oversampling as a means...

Descripción completa

Detalles Bibliográficos
Autores principales: Hemmerich, Jennifer, Asilar, Ece, Ecker, Gerhard F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7080709/
https://www.ncbi.nlm.nih.gov/pubmed/33430975
http://dx.doi.org/10.1186/s13321-020-00420-z
_version_ 1783508047543402496
author Hemmerich, Jennifer
Asilar, Ece
Ecker, Gerhard F.
author_facet Hemmerich, Jennifer
Asilar, Ece
Ecker, Gerhard F.
author_sort Hemmerich, Jennifer
collection PubMed
description Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity and specificity are needed. In this paper we introduce conformational oversampling as a means to balance and oversample datasets for prediction of toxicity. Conformational oversampling enhances a dataset by generation of multiple conformations of a molecule. These conformations can be used to balance, as well as oversample a dataset, thereby increasing the dataset size without the need of artificial samples. We show that conformational oversampling facilitates training of neural networks and provides state-of-the-art results on the Tox21 dataset.
format Online
Article
Text
id pubmed-7080709
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-70807092020-03-23 COVER: conformational oversampling as data augmentation for molecules Hemmerich, Jennifer Asilar, Ece Ecker, Gerhard F. J Cheminform Research Article Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity and specificity are needed. In this paper we introduce conformational oversampling as a means to balance and oversample datasets for prediction of toxicity. Conformational oversampling enhances a dataset by generation of multiple conformations of a molecule. These conformations can be used to balance, as well as oversample a dataset, thereby increasing the dataset size without the need of artificial samples. We show that conformational oversampling facilitates training of neural networks and provides state-of-the-art results on the Tox21 dataset. Springer International Publishing 2020-03-18 /pmc/articles/PMC7080709/ /pubmed/33430975 http://dx.doi.org/10.1186/s13321-020-00420-z Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Hemmerich, Jennifer
Asilar, Ece
Ecker, Gerhard F.
COVER: conformational oversampling as data augmentation for molecules
title COVER: conformational oversampling as data augmentation for molecules
title_full COVER: conformational oversampling as data augmentation for molecules
title_fullStr COVER: conformational oversampling as data augmentation for molecules
title_full_unstemmed COVER: conformational oversampling as data augmentation for molecules
title_short COVER: conformational oversampling as data augmentation for molecules
title_sort cover: conformational oversampling as data augmentation for molecules
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7080709/
https://www.ncbi.nlm.nih.gov/pubmed/33430975
http://dx.doi.org/10.1186/s13321-020-00420-z
work_keys_str_mv AT hemmerichjennifer coverconformationaloversamplingasdataaugmentationformolecules
AT asilarece coverconformationaloversamplingasdataaugmentationformolecules
AT eckergerhardf coverconformationaloversamplingasdataaugmentationformolecules