Cargando…

Global Analysis of Publicly Available Safety Data for 9,801 Substances Registered under REACH from 2008–2014

The European Chemicals Agency (ECHA) warehouses the largest public dataset of in vivo and in vitro toxicity tests. In December 2014 this data was converted into a structured, machine readable and searchable database using natural language processing. It contains data for 9,801 unique substances, 3,6...

Descripción completa

Detalles Bibliográficos
Autores principales: Luechtefeld, Thomas, Maertens, Alexandra, Russo, Daniel P., Rovida, Costanza, Zhu, Hao, Hartung, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408747/
https://www.ncbi.nlm.nih.gov/pubmed/26863090
http://dx.doi.org/10.14573/altex.1510052
_version_ 1783232357158879232
author Luechtefeld, Thomas
Maertens, Alexandra
Russo, Daniel P.
Rovida, Costanza
Zhu, Hao
Hartung, Thomas
author_facet Luechtefeld, Thomas
Maertens, Alexandra
Russo, Daniel P.
Rovida, Costanza
Zhu, Hao
Hartung, Thomas
author_sort Luechtefeld, Thomas
collection PubMed
description The European Chemicals Agency (ECHA) warehouses the largest public dataset of in vivo and in vitro toxicity tests. In December 2014 this data was converted into a structured, machine readable and searchable database using natural language processing. It contains data for 9,801 unique substances, 3,609 unique study descriptions and 816,048 study documents. This allows exploring toxicological data on a scale far larger than previously possible. Substance similarity analysis was used to determine clustering of substances for hazards by mapping to PubChem. Similarity was measured using PubChem 2D conformational substructure fingerprints, which were compared via the Tanimoto metric. Following K-Core filtration, the Blondel et al. (2008) module recognition algorithm was used to identify chemical modules showing clusters of substances in use within the chemical universe. The Global Harmonized System of Classification and Labelling provides a valuable information source for hazard analysis. The most prevalent hazards are H317 “May cause an allergic skin reaction” with 20% and H318 “Causes serious eye damage” with 17% positive substances. Such prevalences obtained for all hazards here are key for the design of integrated testing strategies. The data allowed estimation of animal use. The database covers about 20% of substances in the high-throughput biological assay database Tox21 (1,737 substances) and has a 917 substance overlap with the Comparative Toxicogenomics Database (~7% of CTD). The biological data available in these datasets combined with ECHA in vivo endpoints have enormous modeling potential. A case is made that REACH should systematically open regulatory data for research purposes.
format Online
Article
Text
id pubmed-5408747
institution National Center for Biotechnology Information
language English
publishDate 2016
record_format MEDLINE/PubMed
spelling pubmed-54087472017-04-28 Global Analysis of Publicly Available Safety Data for 9,801 Substances Registered under REACH from 2008–2014 Luechtefeld, Thomas Maertens, Alexandra Russo, Daniel P. Rovida, Costanza Zhu, Hao Hartung, Thomas ALTEX Article The European Chemicals Agency (ECHA) warehouses the largest public dataset of in vivo and in vitro toxicity tests. In December 2014 this data was converted into a structured, machine readable and searchable database using natural language processing. It contains data for 9,801 unique substances, 3,609 unique study descriptions and 816,048 study documents. This allows exploring toxicological data on a scale far larger than previously possible. Substance similarity analysis was used to determine clustering of substances for hazards by mapping to PubChem. Similarity was measured using PubChem 2D conformational substructure fingerprints, which were compared via the Tanimoto metric. Following K-Core filtration, the Blondel et al. (2008) module recognition algorithm was used to identify chemical modules showing clusters of substances in use within the chemical universe. The Global Harmonized System of Classification and Labelling provides a valuable information source for hazard analysis. The most prevalent hazards are H317 “May cause an allergic skin reaction” with 20% and H318 “Causes serious eye damage” with 17% positive substances. Such prevalences obtained for all hazards here are key for the design of integrated testing strategies. The data allowed estimation of animal use. The database covers about 20% of substances in the high-throughput biological assay database Tox21 (1,737 substances) and has a 917 substance overlap with the Comparative Toxicogenomics Database (~7% of CTD). The biological data available in these datasets combined with ECHA in vivo endpoints have enormous modeling potential. A case is made that REACH should systematically open regulatory data for research purposes. 2016-02-11 2016 /pmc/articles/PMC5408747/ /pubmed/26863090 http://dx.doi.org/10.14573/altex.1510052 Text en http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is appropriately cited.
spellingShingle Article
Luechtefeld, Thomas
Maertens, Alexandra
Russo, Daniel P.
Rovida, Costanza
Zhu, Hao
Hartung, Thomas
Global Analysis of Publicly Available Safety Data for 9,801 Substances Registered under REACH from 2008–2014
title Global Analysis of Publicly Available Safety Data for 9,801 Substances Registered under REACH from 2008–2014
title_full Global Analysis of Publicly Available Safety Data for 9,801 Substances Registered under REACH from 2008–2014
title_fullStr Global Analysis of Publicly Available Safety Data for 9,801 Substances Registered under REACH from 2008–2014
title_full_unstemmed Global Analysis of Publicly Available Safety Data for 9,801 Substances Registered under REACH from 2008–2014
title_short Global Analysis of Publicly Available Safety Data for 9,801 Substances Registered under REACH from 2008–2014
title_sort global analysis of publicly available safety data for 9,801 substances registered under reach from 2008–2014
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408747/
https://www.ncbi.nlm.nih.gov/pubmed/26863090
http://dx.doi.org/10.14573/altex.1510052
work_keys_str_mv AT luechtefeldthomas globalanalysisofpubliclyavailablesafetydatafor9801substancesregisteredunderreachfrom20082014
AT maertensalexandra globalanalysisofpubliclyavailablesafetydatafor9801substancesregisteredunderreachfrom20082014
AT russodanielp globalanalysisofpubliclyavailablesafetydatafor9801substancesregisteredunderreachfrom20082014
AT rovidacostanza globalanalysisofpubliclyavailablesafetydatafor9801substancesregisteredunderreachfrom20082014
AT zhuhao globalanalysisofpubliclyavailablesafetydatafor9801substancesregisteredunderreachfrom20082014
AT hartungthomas globalanalysisofpubliclyavailablesafetydatafor9801substancesregisteredunderreachfrom20082014