Cargando…

Automated evaluation of consistency within the PubChem Compound database

Identification of discrepant data in aggregated databases is a key step in data curation and remediation. We have applied the ALATIS approach, which is based on the international chemical shift identifier (InChI) model, to the full PubChem Compound database to generate unique and reproducible compou...

Descripción completa

Detalles Bibliográficos
Autores principales: Dashti, Hesam, Wedell, Jonathan R., Westler, William M., Markley, John L., Eghbalnia, Hamid R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380220/
https://www.ncbi.nlm.nih.gov/pubmed/30778259
http://dx.doi.org/10.1038/sdata.2019.23
_version_ 1783396278833512448
author Dashti, Hesam
Wedell, Jonathan R.
Westler, William M.
Markley, John L.
Eghbalnia, Hamid R.
author_facet Dashti, Hesam
Wedell, Jonathan R.
Westler, William M.
Markley, John L.
Eghbalnia, Hamid R.
author_sort Dashti, Hesam
collection PubMed
description Identification of discrepant data in aggregated databases is a key step in data curation and remediation. We have applied the ALATIS approach, which is based on the international chemical shift identifier (InChI) model, to the full PubChem Compound database to generate unique and reproducible compound and atom identifiers for all entries for which three-dimensional structures were available. This exercise also served to identify entries with discrepancies between structures and chemical formulas or InChI strings. The use of unique compound identifiers and atom nomenclature should support more rigorous links between small-molecule databases including those containing atom-specific information of the type available from crystallography and spectroscopy. The comprehensive results from this analysis are publicly available through our webserver [http://alatis.nmrfam.wisc.edu/].
format Online
Article
Text
id pubmed-6380220
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-63802202019-02-21 Automated evaluation of consistency within the PubChem Compound database Dashti, Hesam Wedell, Jonathan R. Westler, William M. Markley, John L. Eghbalnia, Hamid R. Sci Data Analysis Identification of discrepant data in aggregated databases is a key step in data curation and remediation. We have applied the ALATIS approach, which is based on the international chemical shift identifier (InChI) model, to the full PubChem Compound database to generate unique and reproducible compound and atom identifiers for all entries for which three-dimensional structures were available. This exercise also served to identify entries with discrepancies between structures and chemical formulas or InChI strings. The use of unique compound identifiers and atom nomenclature should support more rigorous links between small-molecule databases including those containing atom-specific information of the type available from crystallography and spectroscopy. The comprehensive results from this analysis are publicly available through our webserver [http://alatis.nmrfam.wisc.edu/]. Nature Publishing Group 2019-02-19 /pmc/articles/PMC6380220/ /pubmed/30778259 http://dx.doi.org/10.1038/sdata.2019.23 Text en Copyright © 2019, The Author(s) http://creativecommons.org/licenses/by/4.0/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Analysis
Dashti, Hesam
Wedell, Jonathan R.
Westler, William M.
Markley, John L.
Eghbalnia, Hamid R.
Automated evaluation of consistency within the PubChem Compound database
title Automated evaluation of consistency within the PubChem Compound database
title_full Automated evaluation of consistency within the PubChem Compound database
title_fullStr Automated evaluation of consistency within the PubChem Compound database
title_full_unstemmed Automated evaluation of consistency within the PubChem Compound database
title_short Automated evaluation of consistency within the PubChem Compound database
title_sort automated evaluation of consistency within the pubchem compound database
topic Analysis
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380220/
https://www.ncbi.nlm.nih.gov/pubmed/30778259
http://dx.doi.org/10.1038/sdata.2019.23
work_keys_str_mv AT dashtihesam automatedevaluationofconsistencywithinthepubchemcompounddatabase
AT wedelljonathanr automatedevaluationofconsistencywithinthepubchemcompounddatabase
AT westlerwilliamm automatedevaluationofconsistencywithinthepubchemcompounddatabase
AT markleyjohnl automatedevaluationofconsistencywithinthepubchemcompounddatabase
AT eghbalniahamidr automatedevaluationofconsistencywithinthepubchemcompounddatabase