Cargando…

Wikipedia on the CompTox Chemicals Dashboard: Connecting Resources to Enrich Public Chemical Data

[Image: see text] The online encyclopedia Wikipedia aggregates a large amount of data on chemistry, encompassing well over 20,000 individual Wikipedia pages and serves the general public as well as the chemistry community. Many other chemical databases and services utilize these data, and previous p...

Descripción completa

Detalles Bibliográficos
Autores principales: Sinclair, Gabriel, Thillainadarajah, Inthirany, Meyer, Brian, Samano, Vicente, Sivasupramaniam, Sakuntala, Adams, Linda, Willighagen, Egon L., Richard, Ann M., Walker, Martin, Williams, Antony J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9597659/
https://www.ncbi.nlm.nih.gov/pubmed/36215146
http://dx.doi.org/10.1021/acs.jcim.2c00886
_version_ 1784816144040853504
author Sinclair, Gabriel
Thillainadarajah, Inthirany
Meyer, Brian
Samano, Vicente
Sivasupramaniam, Sakuntala
Adams, Linda
Willighagen, Egon L.
Richard, Ann M.
Walker, Martin
Williams, Antony J.
author_facet Sinclair, Gabriel
Thillainadarajah, Inthirany
Meyer, Brian
Samano, Vicente
Sivasupramaniam, Sakuntala
Adams, Linda
Willighagen, Egon L.
Richard, Ann M.
Walker, Martin
Williams, Antony J.
author_sort Sinclair, Gabriel
collection PubMed
description [Image: see text] The online encyclopedia Wikipedia aggregates a large amount of data on chemistry, encompassing well over 20,000 individual Wikipedia pages and serves the general public as well as the chemistry community. Many other chemical databases and services utilize these data, and previous projects have focused on methods to index, search, and extract it for review and use. We present a comprehensive effort that combines bulk automated data extraction over tens of thousands of pages, semiautomated data extraction over hundreds of pages, and fine-grained manual extraction of individual lists and compounds of interest. We then correlate these data with the existing contents of the U.S. Environmental Protection Agency’s (EPA) Distributed Structure-Searchable Toxicity (DSSTox) database. This was performed with a number of intentions including ensuring as complete a mapping as possible between the Dashboard and Wikipedia so that relevant snippets of the article are loaded for the user to review. Conflicts between Dashboard content and Wikipedia in terms of, for example, identifiers such as chemical registry numbers, names, and InChIs and structure-based collisions such as SMILES were identified and used as the basis of curation of both DSSTox and Wikipedia. This work also allowed us to evaluate available data for sets of chemicals of interest to the Agency, such as synthetic cannabinoids, and expand the content in DSSTox as appropriate. This work also led to improved bidirectional linkage of the detailed chemistry and usage information from Wikipedia with expert-curated structure and identifier data from DSSTox for a new list of nearly 20,000 chemicals. All of this work ultimately enhances the data mappings that allow for the display of the introduction of the Wikipedia article in the community-accessible web-based EPA Comptox Chemicals Dashboard, enhancing the user experience for the thousands of users per day accessing the resource.
format Online
Article
Text
id pubmed-9597659
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-95976592022-10-27 Wikipedia on the CompTox Chemicals Dashboard: Connecting Resources to Enrich Public Chemical Data Sinclair, Gabriel Thillainadarajah, Inthirany Meyer, Brian Samano, Vicente Sivasupramaniam, Sakuntala Adams, Linda Willighagen, Egon L. Richard, Ann M. Walker, Martin Williams, Antony J. J Chem Inf Model [Image: see text] The online encyclopedia Wikipedia aggregates a large amount of data on chemistry, encompassing well over 20,000 individual Wikipedia pages and serves the general public as well as the chemistry community. Many other chemical databases and services utilize these data, and previous projects have focused on methods to index, search, and extract it for review and use. We present a comprehensive effort that combines bulk automated data extraction over tens of thousands of pages, semiautomated data extraction over hundreds of pages, and fine-grained manual extraction of individual lists and compounds of interest. We then correlate these data with the existing contents of the U.S. Environmental Protection Agency’s (EPA) Distributed Structure-Searchable Toxicity (DSSTox) database. This was performed with a number of intentions including ensuring as complete a mapping as possible between the Dashboard and Wikipedia so that relevant snippets of the article are loaded for the user to review. Conflicts between Dashboard content and Wikipedia in terms of, for example, identifiers such as chemical registry numbers, names, and InChIs and structure-based collisions such as SMILES were identified and used as the basis of curation of both DSSTox and Wikipedia. This work also allowed us to evaluate available data for sets of chemicals of interest to the Agency, such as synthetic cannabinoids, and expand the content in DSSTox as appropriate. This work also led to improved bidirectional linkage of the detailed chemistry and usage information from Wikipedia with expert-curated structure and identifier data from DSSTox for a new list of nearly 20,000 chemicals. All of this work ultimately enhances the data mappings that allow for the display of the introduction of the Wikipedia article in the community-accessible web-based EPA Comptox Chemicals Dashboard, enhancing the user experience for the thousands of users per day accessing the resource. American Chemical Society 2022-10-10 2022-10-24 /pmc/articles/PMC9597659/ /pubmed/36215146 http://dx.doi.org/10.1021/acs.jcim.2c00886 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Sinclair, Gabriel
Thillainadarajah, Inthirany
Meyer, Brian
Samano, Vicente
Sivasupramaniam, Sakuntala
Adams, Linda
Willighagen, Egon L.
Richard, Ann M.
Walker, Martin
Williams, Antony J.
Wikipedia on the CompTox Chemicals Dashboard: Connecting Resources to Enrich Public Chemical Data
title Wikipedia on the CompTox Chemicals Dashboard: Connecting Resources to Enrich Public Chemical Data
title_full Wikipedia on the CompTox Chemicals Dashboard: Connecting Resources to Enrich Public Chemical Data
title_fullStr Wikipedia on the CompTox Chemicals Dashboard: Connecting Resources to Enrich Public Chemical Data
title_full_unstemmed Wikipedia on the CompTox Chemicals Dashboard: Connecting Resources to Enrich Public Chemical Data
title_short Wikipedia on the CompTox Chemicals Dashboard: Connecting Resources to Enrich Public Chemical Data
title_sort wikipedia on the comptox chemicals dashboard: connecting resources to enrich public chemical data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9597659/
https://www.ncbi.nlm.nih.gov/pubmed/36215146
http://dx.doi.org/10.1021/acs.jcim.2c00886
work_keys_str_mv AT sinclairgabriel wikipediaonthecomptoxchemicalsdashboardconnectingresourcestoenrichpublicchemicaldata
AT thillainadarajahinthirany wikipediaonthecomptoxchemicalsdashboardconnectingresourcestoenrichpublicchemicaldata
AT meyerbrian wikipediaonthecomptoxchemicalsdashboardconnectingresourcestoenrichpublicchemicaldata
AT samanovicente wikipediaonthecomptoxchemicalsdashboardconnectingresourcestoenrichpublicchemicaldata
AT sivasupramaniamsakuntala wikipediaonthecomptoxchemicalsdashboardconnectingresourcestoenrichpublicchemicaldata
AT adamslinda wikipediaonthecomptoxchemicalsdashboardconnectingresourcestoenrichpublicchemicaldata
AT willighagenegonl wikipediaonthecomptoxchemicalsdashboardconnectingresourcestoenrichpublicchemicaldata
AT richardannm wikipediaonthecomptoxchemicalsdashboardconnectingresourcestoenrichpublicchemicaldata
AT walkermartin wikipediaonthecomptoxchemicalsdashboardconnectingresourcestoenrichpublicchemicaldata
AT williamsantonyj wikipediaonthecomptoxchemicalsdashboardconnectingresourcestoenrichpublicchemicaldata