Cargando…

EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research

The US Environmental Protection Agency’s (EPA) Distributed Structure-Searchable Toxicity (DSSTox) database, launched publicly in 2004, currently exceeds 875 K substances spanning hundreds of lists of interest to EPA and environmental researchers. From its inception, DSSTox has focused curation effor...

Descripción completa

Detalles Bibliográficos
Autores principales: Grulke, Christopher M., Williams, Antony J., Thillanadarajah, Inthirany, Richard, Ann M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7787967/
https://www.ncbi.nlm.nih.gov/pubmed/33426407
http://dx.doi.org/10.1016/j.comtox.2019.100096
_version_ 1783632937833463808
author Grulke, Christopher M.
Williams, Antony J.
Thillanadarajah, Inthirany
Richard, Ann M.
author_facet Grulke, Christopher M.
Williams, Antony J.
Thillanadarajah, Inthirany
Richard, Ann M.
author_sort Grulke, Christopher M.
collection PubMed
description The US Environmental Protection Agency’s (EPA) Distributed Structure-Searchable Toxicity (DSSTox) database, launched publicly in 2004, currently exceeds 875 K substances spanning hundreds of lists of interest to EPA and environmental researchers. From its inception, DSSTox has focused curation efforts on resolving chemical identifier errors and conflicts in the public domain towards the goal of assigning accurate chemical structures to data and lists of importance to the environmental research and regulatory community. Accurate structure-data associations, in turn, are necessary inputs to structure-based predictive models supporting hazard and risk assessments. In 2014, the legacy, manually curated DSSTox_V1 content was migrated to a MySQL data model, with modern cheminformatics tools supporting both manual and automated curation processes to increase efficiencies. This was followed by sequential auto-loads of filtered portions of three public datasets: EPA’s Substance Registry Services (SRS), the National Library of Medicine’s ChemID, and PubChem. This process was constrained by a key requirement of uniquely mapped identifiers (i.e., CAS RN, name and structure) for each substance, rejecting content where any two identifiers were conflicted either within or across datasets. This rejected content highlighted the degree of conflicting, inaccurate substance-structure ID mappings in the public domain, ranging from 12% (within EPA SRS) to 49% (across ChemID and PubChem). Substances successfully added to DSSTox from each auto-load were assigned to one of five qc_levels, conveying curator confidence in each dataset. This process enabled a significant expansion of DSSTox content to provide better coverage of the chemical landscape of interest to environmental scientists, while retaining focus on the accuracy of substance-structure-data associations. Currently, DSSTox serves as the core foundation of EPA’s CompTox Chemicals Dashboard [https://comptox.epa.gov/dashboard], which provides public access to DSSTox content in support of a broad range of modeling and research activities within EPA and, increasingly, across the field of computational toxicology.
format Online
Article
Text
id pubmed-7787967
institution National Center for Biotechnology Information
language English
publishDate 2019
record_format MEDLINE/PubMed
spelling pubmed-77879672021-01-07 EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research Grulke, Christopher M. Williams, Antony J. Thillanadarajah, Inthirany Richard, Ann M. Comput Toxicol Article The US Environmental Protection Agency’s (EPA) Distributed Structure-Searchable Toxicity (DSSTox) database, launched publicly in 2004, currently exceeds 875 K substances spanning hundreds of lists of interest to EPA and environmental researchers. From its inception, DSSTox has focused curation efforts on resolving chemical identifier errors and conflicts in the public domain towards the goal of assigning accurate chemical structures to data and lists of importance to the environmental research and regulatory community. Accurate structure-data associations, in turn, are necessary inputs to structure-based predictive models supporting hazard and risk assessments. In 2014, the legacy, manually curated DSSTox_V1 content was migrated to a MySQL data model, with modern cheminformatics tools supporting both manual and automated curation processes to increase efficiencies. This was followed by sequential auto-loads of filtered portions of three public datasets: EPA’s Substance Registry Services (SRS), the National Library of Medicine’s ChemID, and PubChem. This process was constrained by a key requirement of uniquely mapped identifiers (i.e., CAS RN, name and structure) for each substance, rejecting content where any two identifiers were conflicted either within or across datasets. This rejected content highlighted the degree of conflicting, inaccurate substance-structure ID mappings in the public domain, ranging from 12% (within EPA SRS) to 49% (across ChemID and PubChem). Substances successfully added to DSSTox from each auto-load were assigned to one of five qc_levels, conveying curator confidence in each dataset. This process enabled a significant expansion of DSSTox content to provide better coverage of the chemical landscape of interest to environmental scientists, while retaining focus on the accuracy of substance-structure-data associations. Currently, DSSTox serves as the core foundation of EPA’s CompTox Chemicals Dashboard [https://comptox.epa.gov/dashboard], which provides public access to DSSTox content in support of a broad range of modeling and research activities within EPA and, increasingly, across the field of computational toxicology. 2019-11-01 /pmc/articles/PMC7787967/ /pubmed/33426407 http://dx.doi.org/10.1016/j.comtox.2019.100096 Text en https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/BY/4.0/ (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle Article
Grulke, Christopher M.
Williams, Antony J.
Thillanadarajah, Inthirany
Richard, Ann M.
EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research
title EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research
title_full EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research
title_fullStr EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research
title_full_unstemmed EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research
title_short EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research
title_sort epa’s dsstox database: history of development of a curated chemistry resource supporting computational toxicology research
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7787967/
https://www.ncbi.nlm.nih.gov/pubmed/33426407
http://dx.doi.org/10.1016/j.comtox.2019.100096
work_keys_str_mv AT grulkechristopherm epasdsstoxdatabasehistoryofdevelopmentofacuratedchemistryresourcesupportingcomputationaltoxicologyresearch
AT williamsantonyj epasdsstoxdatabasehistoryofdevelopmentofacuratedchemistryresourcesupportingcomputationaltoxicologyresearch
AT thillanadarajahinthirany epasdsstoxdatabasehistoryofdevelopmentofacuratedchemistryresourcesupportingcomputationaltoxicologyresearch
AT richardannm epasdsstoxdatabasehistoryofdevelopmentofacuratedchemistryresourcesupportingcomputationaltoxicologyresearch