Cargando…

Novel machine learning models to predict endocrine disruption activity for high-throughput chemical screening

An area of ongoing concern in toxicology and chemical risk assessment is endocrine disrupting chemicals (EDCs). However, thousands of legacy chemicals lack the toxicity testing required to assess their respective EDC potential, and this is where computational toxicology can play a crucial role. The...

Descripción completa

Detalles Bibliográficos
Autores principales: Collins, Sean P., Barton-Maclaren, Tara S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9530987/
https://www.ncbi.nlm.nih.gov/pubmed/36204696
http://dx.doi.org/10.3389/ftox.2022.981928
_version_ 1784801805155172352
author Collins, Sean P.
Barton-Maclaren, Tara S.
author_facet Collins, Sean P.
Barton-Maclaren, Tara S.
author_sort Collins, Sean P.
collection PubMed
description An area of ongoing concern in toxicology and chemical risk assessment is endocrine disrupting chemicals (EDCs). However, thousands of legacy chemicals lack the toxicity testing required to assess their respective EDC potential, and this is where computational toxicology can play a crucial role. The US (United States) Environmental Protection Agency (EPA) has run two programs, the Collaborative Estrogen Receptor Activity Project (CERAPP) and the Collaborative Modeling Project for Receptor Activity (CoMPARA) which aim to predict estrogen and androgen activity, respectively. The US EPA solicited research groups from around the world to provide endocrine receptor activity Qualitative (or Quantitative) Structure Activity Relationship ([Q]SAR) models and then combined them to create consensus models for different toxicity endpoints. Random Forest (RF) models were developed to cover a broader range of substances with high predictive capabilities using large datasets from CERAPP and CoMPARA for estrogen and androgen activity, respectively. By utilizing simple descriptors from open-source software and large training datasets, RF models were created to expand the domain of applicability for predicting endocrine disrupting activity and help in the screening and prioritization of extensive chemical inventories. In addition, RFs were trained to conservatively predict the activity, meaning models are more likely to make false-positive predictions to minimize the number of False Negatives. This work presents twelve binary and multi-class RF models to predict binding, agonism, and antagonism for estrogen and androgen receptors. The RF models were found to have high predictive capabilities compared to other in silico modes, with some models reaching balanced accuracies of 93% while having coverage of 89%. These models are intended to be incorporated into evolving priority-setting workflows and integrated strategies to support the screening and selection of chemicals for further testing and assessment by identifying potential endocrine-disrupting substances.
format Online
Article
Text
id pubmed-9530987
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-95309872022-10-05 Novel machine learning models to predict endocrine disruption activity for high-throughput chemical screening Collins, Sean P. Barton-Maclaren, Tara S. Front Toxicol Toxicology An area of ongoing concern in toxicology and chemical risk assessment is endocrine disrupting chemicals (EDCs). However, thousands of legacy chemicals lack the toxicity testing required to assess their respective EDC potential, and this is where computational toxicology can play a crucial role. The US (United States) Environmental Protection Agency (EPA) has run two programs, the Collaborative Estrogen Receptor Activity Project (CERAPP) and the Collaborative Modeling Project for Receptor Activity (CoMPARA) which aim to predict estrogen and androgen activity, respectively. The US EPA solicited research groups from around the world to provide endocrine receptor activity Qualitative (or Quantitative) Structure Activity Relationship ([Q]SAR) models and then combined them to create consensus models for different toxicity endpoints. Random Forest (RF) models were developed to cover a broader range of substances with high predictive capabilities using large datasets from CERAPP and CoMPARA for estrogen and androgen activity, respectively. By utilizing simple descriptors from open-source software and large training datasets, RF models were created to expand the domain of applicability for predicting endocrine disrupting activity and help in the screening and prioritization of extensive chemical inventories. In addition, RFs were trained to conservatively predict the activity, meaning models are more likely to make false-positive predictions to minimize the number of False Negatives. This work presents twelve binary and multi-class RF models to predict binding, agonism, and antagonism for estrogen and androgen receptors. The RF models were found to have high predictive capabilities compared to other in silico modes, with some models reaching balanced accuracies of 93% while having coverage of 89%. These models are intended to be incorporated into evolving priority-setting workflows and integrated strategies to support the screening and selection of chemicals for further testing and assessment by identifying potential endocrine-disrupting substances. Frontiers Media S.A. 2022-09-20 /pmc/articles/PMC9530987/ /pubmed/36204696 http://dx.doi.org/10.3389/ftox.2022.981928 Text en Copyright © 2022 Collins and Barton-Maclaren. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Toxicology
Collins, Sean P.
Barton-Maclaren, Tara S.
Novel machine learning models to predict endocrine disruption activity for high-throughput chemical screening
title Novel machine learning models to predict endocrine disruption activity for high-throughput chemical screening
title_full Novel machine learning models to predict endocrine disruption activity for high-throughput chemical screening
title_fullStr Novel machine learning models to predict endocrine disruption activity for high-throughput chemical screening
title_full_unstemmed Novel machine learning models to predict endocrine disruption activity for high-throughput chemical screening
title_short Novel machine learning models to predict endocrine disruption activity for high-throughput chemical screening
title_sort novel machine learning models to predict endocrine disruption activity for high-throughput chemical screening
topic Toxicology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9530987/
https://www.ncbi.nlm.nih.gov/pubmed/36204696
http://dx.doi.org/10.3389/ftox.2022.981928
work_keys_str_mv AT collinsseanp novelmachinelearningmodelstopredictendocrinedisruptionactivityforhighthroughputchemicalscreening
AT bartonmaclarentaras novelmachinelearningmodelstopredictendocrinedisruptionactivityforhighthroughputchemicalscreening