Cargando…

Machine Learning of Toxicological Big Data Enables Read-Across Structure Activity Relationships (RASAR) Outperforming Animal Test Reproducibility

Earlier we created a chemical hazard database via natural language processing of dossiers submitted to the European Chemical Agency with approximately 10 000 chemicals. We identified repeat OECD guideline tests to establish reproducibility of acute oral and dermal toxicity, eye and skin irritation,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Luechtefeld, Thomas, Marsh, Dan, Rowlands, Craig, Hartung, Thomas
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Machine Learning for Read-across Toxicity Testing
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6135638/ https://www.ncbi.nlm.nih.gov/pubmed/30007363 http://dx.doi.org/10.1093/toxsci/kfy152

_version_	1783354855676444672
author	Luechtefeld, Thomas Marsh, Dan Rowlands, Craig Hartung, Thomas
author_facet	Luechtefeld, Thomas Marsh, Dan Rowlands, Craig Hartung, Thomas
author_sort	Luechtefeld, Thomas
collection	PubMed
description	Earlier we created a chemical hazard database via natural language processing of dossiers submitted to the European Chemical Agency with approximately 10 000 chemicals. We identified repeat OECD guideline tests to establish reproducibility of acute oral and dermal toxicity, eye and skin irritation, mutagenicity and skin sensitization. Based on 350–700+ chemicals each, the probability that an OECD guideline animal test would output the same result in a repeat test was 78%–96% (sensitivity 50%–87%). An expanded database with more than 866 000 chemical properties/hazards was used as training data and to model health hazards and chemical properties. The constructed models automate and extend the read-across method of chemical classification. The novel models called RASARs (read-across structure activity relationship) use binary fingerprints and Jaccard distance to define chemical similarity. A large chemical similarity adjacency matrix is constructed from this similarity metric and is used to derive feature vectors for supervised learning. We show results on 9 health hazards from 2 kinds of RASARs—“Simple” and “Data Fusion”. The “Simple” RASAR seeks to duplicate the traditional read-across method, predicting hazard from chemical analogs with known hazard data. The “Data Fusion” RASAR extends this concept by creating large feature vectors from all available property data rather than only the modeled hazard. Simple RASAR models tested in cross-validation achieve 70%–80% balanced accuracies with constraints on tested compounds. Cross validation of data fusion RASARs show balanced accuracies in the 80%–95% range across 9 health hazards with no constraints on tested compounds.
format	Online Article Text
id	pubmed-6135638
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-61356382018-09-24 Machine Learning of Toxicological Big Data Enables Read-Across Structure Activity Relationships (RASAR) Outperforming Animal Test Reproducibility Luechtefeld, Thomas Marsh, Dan Rowlands, Craig Hartung, Thomas Toxicol Sci Machine Learning for Read-across Toxicity Testing Earlier we created a chemical hazard database via natural language processing of dossiers submitted to the European Chemical Agency with approximately 10 000 chemicals. We identified repeat OECD guideline tests to establish reproducibility of acute oral and dermal toxicity, eye and skin irritation, mutagenicity and skin sensitization. Based on 350–700+ chemicals each, the probability that an OECD guideline animal test would output the same result in a repeat test was 78%–96% (sensitivity 50%–87%). An expanded database with more than 866 000 chemical properties/hazards was used as training data and to model health hazards and chemical properties. The constructed models automate and extend the read-across method of chemical classification. The novel models called RASARs (read-across structure activity relationship) use binary fingerprints and Jaccard distance to define chemical similarity. A large chemical similarity adjacency matrix is constructed from this similarity metric and is used to derive feature vectors for supervised learning. We show results on 9 health hazards from 2 kinds of RASARs—“Simple” and “Data Fusion”. The “Simple” RASAR seeks to duplicate the traditional read-across method, predicting hazard from chemical analogs with known hazard data. The “Data Fusion” RASAR extends this concept by creating large feature vectors from all available property data rather than only the modeled hazard. Simple RASAR models tested in cross-validation achieve 70%–80% balanced accuracies with constraints on tested compounds. Cross validation of data fusion RASARs show balanced accuracies in the 80%–95% range across 9 health hazards with no constraints on tested compounds. Oxford University Press 2018-09 2018-07-11 /pmc/articles/PMC6135638/ /pubmed/30007363 http://dx.doi.org/10.1093/toxsci/kfy152 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of the Society of Toxicology http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Machine Learning for Read-across Toxicity Testing Luechtefeld, Thomas Marsh, Dan Rowlands, Craig Hartung, Thomas Machine Learning of Toxicological Big Data Enables Read-Across Structure Activity Relationships (RASAR) Outperforming Animal Test Reproducibility
title	Machine Learning of Toxicological Big Data Enables Read-Across Structure Activity Relationships (RASAR) Outperforming Animal Test Reproducibility
title_full	Machine Learning of Toxicological Big Data Enables Read-Across Structure Activity Relationships (RASAR) Outperforming Animal Test Reproducibility
title_fullStr	Machine Learning of Toxicological Big Data Enables Read-Across Structure Activity Relationships (RASAR) Outperforming Animal Test Reproducibility
title_full_unstemmed	Machine Learning of Toxicological Big Data Enables Read-Across Structure Activity Relationships (RASAR) Outperforming Animal Test Reproducibility
title_short	Machine Learning of Toxicological Big Data Enables Read-Across Structure Activity Relationships (RASAR) Outperforming Animal Test Reproducibility
title_sort	machine learning of toxicological big data enables read-across structure activity relationships (rasar) outperforming animal test reproducibility
topic	Machine Learning for Read-across Toxicity Testing
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6135638/ https://www.ncbi.nlm.nih.gov/pubmed/30007363 http://dx.doi.org/10.1093/toxsci/kfy152
work_keys_str_mv	AT luechtefeldthomas machinelearningoftoxicologicalbigdataenablesreadacrossstructureactivityrelationshipsrasaroutperforminganimaltestreproducibility AT marshdan machinelearningoftoxicologicalbigdataenablesreadacrossstructureactivityrelationshipsrasaroutperforminganimaltestreproducibility AT rowlandscraig machinelearningoftoxicologicalbigdataenablesreadacrossstructureactivityrelationshipsrasaroutperforminganimaltestreproducibility AT hartungthomas machinelearningoftoxicologicalbigdataenablesreadacrossstructureactivityrelationshipsrasaroutperforminganimaltestreproducibility

Machine Learning of Toxicological Big Data Enables Read-Across Structure Activity Relationships (RASAR) Outperforming Animal Test Reproducibility

Ejemplares similares