Cargando…
Automated evaluation of consistency within the PubChem Compound database
Identification of discrepant data in aggregated databases is a key step in data curation and remediation. We have applied the ALATIS approach, which is based on the international chemical shift identifier (InChI) model, to the full PubChem Compound database to generate unique and reproducible compou...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380220/ https://www.ncbi.nlm.nih.gov/pubmed/30778259 http://dx.doi.org/10.1038/sdata.2019.23 |
_version_ | 1783396278833512448 |
---|---|
author | Dashti, Hesam Wedell, Jonathan R. Westler, William M. Markley, John L. Eghbalnia, Hamid R. |
author_facet | Dashti, Hesam Wedell, Jonathan R. Westler, William M. Markley, John L. Eghbalnia, Hamid R. |
author_sort | Dashti, Hesam |
collection | PubMed |
description | Identification of discrepant data in aggregated databases is a key step in data curation and remediation. We have applied the ALATIS approach, which is based on the international chemical shift identifier (InChI) model, to the full PubChem Compound database to generate unique and reproducible compound and atom identifiers for all entries for which three-dimensional structures were available. This exercise also served to identify entries with discrepancies between structures and chemical formulas or InChI strings. The use of unique compound identifiers and atom nomenclature should support more rigorous links between small-molecule databases including those containing atom-specific information of the type available from crystallography and spectroscopy. The comprehensive results from this analysis are publicly available through our webserver [http://alatis.nmrfam.wisc.edu/]. |
format | Online Article Text |
id | pubmed-6380220 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-63802202019-02-21 Automated evaluation of consistency within the PubChem Compound database Dashti, Hesam Wedell, Jonathan R. Westler, William M. Markley, John L. Eghbalnia, Hamid R. Sci Data Analysis Identification of discrepant data in aggregated databases is a key step in data curation and remediation. We have applied the ALATIS approach, which is based on the international chemical shift identifier (InChI) model, to the full PubChem Compound database to generate unique and reproducible compound and atom identifiers for all entries for which three-dimensional structures were available. This exercise also served to identify entries with discrepancies between structures and chemical formulas or InChI strings. The use of unique compound identifiers and atom nomenclature should support more rigorous links between small-molecule databases including those containing atom-specific information of the type available from crystallography and spectroscopy. The comprehensive results from this analysis are publicly available through our webserver [http://alatis.nmrfam.wisc.edu/]. Nature Publishing Group 2019-02-19 /pmc/articles/PMC6380220/ /pubmed/30778259 http://dx.doi.org/10.1038/sdata.2019.23 Text en Copyright © 2019, The Author(s) http://creativecommons.org/licenses/by/4.0/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Analysis Dashti, Hesam Wedell, Jonathan R. Westler, William M. Markley, John L. Eghbalnia, Hamid R. Automated evaluation of consistency within the PubChem Compound database |
title | Automated evaluation of consistency within the PubChem Compound database |
title_full | Automated evaluation of consistency within the PubChem Compound database |
title_fullStr | Automated evaluation of consistency within the PubChem Compound database |
title_full_unstemmed | Automated evaluation of consistency within the PubChem Compound database |
title_short | Automated evaluation of consistency within the PubChem Compound database |
title_sort | automated evaluation of consistency within the pubchem compound database |
topic | Analysis |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380220/ https://www.ncbi.nlm.nih.gov/pubmed/30778259 http://dx.doi.org/10.1038/sdata.2019.23 |
work_keys_str_mv | AT dashtihesam automatedevaluationofconsistencywithinthepubchemcompounddatabase AT wedelljonathanr automatedevaluationofconsistencywithinthepubchemcompounddatabase AT westlerwilliamm automatedevaluationofconsistencywithinthepubchemcompounddatabase AT markleyjohnl automatedevaluationofconsistencywithinthepubchemcompounddatabase AT eghbalniahamidr automatedevaluationofconsistencywithinthepubchemcompounddatabase |