Cargando…

Using Machine Learning to Parse Chemical Mixture Descriptions

[Image: see text] Chemical mixtures have recently come to the attention of open standards and data structures for capturing machine-readable descriptions for informatics uses. At the present time, essentially all transmission of information about mixtures is done using short text descriptions that a...

Descripción completa

Detalles Bibliográficos
Autores principales: Clark, Alex M., Gedeck, Peter, Cheung, Philip P., Bunin, Barry A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2021
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8412965/
https://www.ncbi.nlm.nih.gov/pubmed/34497929
http://dx.doi.org/10.1021/acsomega.1c03311
_version_ 1783747564994035712
author Clark, Alex M.
Gedeck, Peter
Cheung, Philip P.
Bunin, Barry A.
author_facet Clark, Alex M.
Gedeck, Peter
Cheung, Philip P.
Bunin, Barry A.
author_sort Clark, Alex M.
collection PubMed
description [Image: see text] Chemical mixtures have recently come to the attention of open standards and data structures for capturing machine-readable descriptions for informatics uses. At the present time, essentially all transmission of information about mixtures is done using short text descriptions that are readable only by trained scientists, and there are no accessible repositories of marked-up mixture data. We have designed a machine learning tool that can interpret mixture descriptions and upgrade them to the high-level Mixfile format, which can in turn be used to generate Mixtures InChI notation. The interpretation achieves a high success rate and can be used at scale to markup large catalogs and inventories, with some expert checking to catch edge cases. The training data that was accumulated during the project is made openly available, along with previously released mixture editing tools and utilities.
format Online
Article
Text
id pubmed-8412965
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-84129652021-09-07 Using Machine Learning to Parse Chemical Mixture Descriptions Clark, Alex M. Gedeck, Peter Cheung, Philip P. Bunin, Barry A. ACS Omega [Image: see text] Chemical mixtures have recently come to the attention of open standards and data structures for capturing machine-readable descriptions for informatics uses. At the present time, essentially all transmission of information about mixtures is done using short text descriptions that are readable only by trained scientists, and there are no accessible repositories of marked-up mixture data. We have designed a machine learning tool that can interpret mixture descriptions and upgrade them to the high-level Mixfile format, which can in turn be used to generate Mixtures InChI notation. The interpretation achieves a high success rate and can be used at scale to markup large catalogs and inventories, with some expert checking to catch edge cases. The training data that was accumulated during the project is made openly available, along with previously released mixture editing tools and utilities. American Chemical Society 2021-08-18 /pmc/articles/PMC8412965/ /pubmed/34497929 http://dx.doi.org/10.1021/acsomega.1c03311 Text en © 2021 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Clark, Alex M.
Gedeck, Peter
Cheung, Philip P.
Bunin, Barry A.
Using Machine Learning to Parse Chemical Mixture Descriptions
title Using Machine Learning to Parse Chemical Mixture Descriptions
title_full Using Machine Learning to Parse Chemical Mixture Descriptions
title_fullStr Using Machine Learning to Parse Chemical Mixture Descriptions
title_full_unstemmed Using Machine Learning to Parse Chemical Mixture Descriptions
title_short Using Machine Learning to Parse Chemical Mixture Descriptions
title_sort using machine learning to parse chemical mixture descriptions
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8412965/
https://www.ncbi.nlm.nih.gov/pubmed/34497929
http://dx.doi.org/10.1021/acsomega.1c03311
work_keys_str_mv AT clarkalexm usingmachinelearningtoparsechemicalmixturedescriptions
AT gedeckpeter usingmachinelearningtoparsechemicalmixturedescriptions
AT cheungphilipp usingmachinelearningtoparsechemicalmixturedescriptions
AT buninbarrya usingmachinelearningtoparsechemicalmixturedescriptions