Cargando…

ChemProps: A RESTful API enabled database for composite polymer name standardization

The inconsistency of polymer indexing caused by the lack of uniformity in expression of polymer names is a major challenge for widespread use of polymer related data resources and limits broad application of materials informatics for innovation in broad classes of polymer science and polymeric based...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Bingyin, Lin, Anqi, Brinson, L. Catherine
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7955638/
https://www.ncbi.nlm.nih.gov/pubmed/33712066
http://dx.doi.org/10.1186/s13321-021-00502-6
_version_ 1783664284414246912
author Hu, Bingyin
Lin, Anqi
Brinson, L. Catherine
author_facet Hu, Bingyin
Lin, Anqi
Brinson, L. Catherine
author_sort Hu, Bingyin
collection PubMed
description The inconsistency of polymer indexing caused by the lack of uniformity in expression of polymer names is a major challenge for widespread use of polymer related data resources and limits broad application of materials informatics for innovation in broad classes of polymer science and polymeric based materials. The current solution of using a variety of different chemical identifiers has proven insufficient to address the challenge and is not intuitive for researchers. This work proposes a multi-algorithm-based mapping methodology entitled ChemProps that is optimized to solve the polymer indexing issue with easy-to-update design both in depth and in width. RESTful API is enabled for lightweight data exchange and easy integration across data systems. A weight factor is assigned to each algorithm to generate scores for candidate chemical names and optimized to maximize the minimum value of the score difference between the ground truth chemical name and the other candidate chemical names. Ten-fold validation is utilized on the 160 training data points to prevent overfitting issues. The obtained set of weight factors achieves a 100% test accuracy on the 54 test data points. The weight factors will evolve as ChemProps grows. With ChemProps, other polymer databases can remove duplicate entries and enable a more accurate “search by SMILES” function by using ChemProps as a common name-to-SMILES translator through API calls. ChemProps is also an excellent tool for auto-populating polymer properties thanks to its easy-to-update design. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-021-00502-6.
format Online
Article
Text
id pubmed-7955638
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-79556382021-03-15 ChemProps: A RESTful API enabled database for composite polymer name standardization Hu, Bingyin Lin, Anqi Brinson, L. Catherine J Cheminform Research Article The inconsistency of polymer indexing caused by the lack of uniformity in expression of polymer names is a major challenge for widespread use of polymer related data resources and limits broad application of materials informatics for innovation in broad classes of polymer science and polymeric based materials. The current solution of using a variety of different chemical identifiers has proven insufficient to address the challenge and is not intuitive for researchers. This work proposes a multi-algorithm-based mapping methodology entitled ChemProps that is optimized to solve the polymer indexing issue with easy-to-update design both in depth and in width. RESTful API is enabled for lightweight data exchange and easy integration across data systems. A weight factor is assigned to each algorithm to generate scores for candidate chemical names and optimized to maximize the minimum value of the score difference between the ground truth chemical name and the other candidate chemical names. Ten-fold validation is utilized on the 160 training data points to prevent overfitting issues. The obtained set of weight factors achieves a 100% test accuracy on the 54 test data points. The weight factors will evolve as ChemProps grows. With ChemProps, other polymer databases can remove duplicate entries and enable a more accurate “search by SMILES” function by using ChemProps as a common name-to-SMILES translator through API calls. ChemProps is also an excellent tool for auto-populating polymer properties thanks to its easy-to-update design. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-021-00502-6. Springer International Publishing 2021-03-12 /pmc/articles/PMC7955638/ /pubmed/33712066 http://dx.doi.org/10.1186/s13321-021-00502-6 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Hu, Bingyin
Lin, Anqi
Brinson, L. Catherine
ChemProps: A RESTful API enabled database for composite polymer name standardization
title ChemProps: A RESTful API enabled database for composite polymer name standardization
title_full ChemProps: A RESTful API enabled database for composite polymer name standardization
title_fullStr ChemProps: A RESTful API enabled database for composite polymer name standardization
title_full_unstemmed ChemProps: A RESTful API enabled database for composite polymer name standardization
title_short ChemProps: A RESTful API enabled database for composite polymer name standardization
title_sort chemprops: a restful api enabled database for composite polymer name standardization
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7955638/
https://www.ncbi.nlm.nih.gov/pubmed/33712066
http://dx.doi.org/10.1186/s13321-021-00502-6
work_keys_str_mv AT hubingyin chempropsarestfulapienableddatabaseforcompositepolymernamestandardization
AT linanqi chempropsarestfulapienableddatabaseforcompositepolymernamestandardization
AT brinsonlcatherine chempropsarestfulapienableddatabaseforcompositepolymernamestandardization