Cargando…
Using Machine Learning to Parse Chemical Mixture Descriptions
[Image: see text] Chemical mixtures have recently come to the attention of open standards and data structures for capturing machine-readable descriptions for informatics uses. At the present time, essentially all transmission of information about mixtures is done using short text descriptions that a...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2021
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8412965/ https://www.ncbi.nlm.nih.gov/pubmed/34497929 http://dx.doi.org/10.1021/acsomega.1c03311 |
_version_ | 1783747564994035712 |
---|---|
author | Clark, Alex M. Gedeck, Peter Cheung, Philip P. Bunin, Barry A. |
author_facet | Clark, Alex M. Gedeck, Peter Cheung, Philip P. Bunin, Barry A. |
author_sort | Clark, Alex M. |
collection | PubMed |
description | [Image: see text] Chemical mixtures have recently come to the attention of open standards and data structures for capturing machine-readable descriptions for informatics uses. At the present time, essentially all transmission of information about mixtures is done using short text descriptions that are readable only by trained scientists, and there are no accessible repositories of marked-up mixture data. We have designed a machine learning tool that can interpret mixture descriptions and upgrade them to the high-level Mixfile format, which can in turn be used to generate Mixtures InChI notation. The interpretation achieves a high success rate and can be used at scale to markup large catalogs and inventories, with some expert checking to catch edge cases. The training data that was accumulated during the project is made openly available, along with previously released mixture editing tools and utilities. |
format | Online Article Text |
id | pubmed-8412965 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-84129652021-09-07 Using Machine Learning to Parse Chemical Mixture Descriptions Clark, Alex M. Gedeck, Peter Cheung, Philip P. Bunin, Barry A. ACS Omega [Image: see text] Chemical mixtures have recently come to the attention of open standards and data structures for capturing machine-readable descriptions for informatics uses. At the present time, essentially all transmission of information about mixtures is done using short text descriptions that are readable only by trained scientists, and there are no accessible repositories of marked-up mixture data. We have designed a machine learning tool that can interpret mixture descriptions and upgrade them to the high-level Mixfile format, which can in turn be used to generate Mixtures InChI notation. The interpretation achieves a high success rate and can be used at scale to markup large catalogs and inventories, with some expert checking to catch edge cases. The training data that was accumulated during the project is made openly available, along with previously released mixture editing tools and utilities. American Chemical Society 2021-08-18 /pmc/articles/PMC8412965/ /pubmed/34497929 http://dx.doi.org/10.1021/acsomega.1c03311 Text en © 2021 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Clark, Alex M. Gedeck, Peter Cheung, Philip P. Bunin, Barry A. Using Machine Learning to Parse Chemical Mixture Descriptions |
title | Using Machine Learning to Parse Chemical Mixture Descriptions |
title_full | Using Machine Learning to Parse Chemical Mixture Descriptions |
title_fullStr | Using Machine Learning to Parse Chemical Mixture Descriptions |
title_full_unstemmed | Using Machine Learning to Parse Chemical Mixture Descriptions |
title_short | Using Machine Learning to Parse Chemical Mixture Descriptions |
title_sort | using machine learning to parse chemical mixture descriptions |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8412965/ https://www.ncbi.nlm.nih.gov/pubmed/34497929 http://dx.doi.org/10.1021/acsomega.1c03311 |
work_keys_str_mv | AT clarkalexm usingmachinelearningtoparsechemicalmixturedescriptions AT gedeckpeter usingmachinelearningtoparsechemicalmixturedescriptions AT cheungphilipp usingmachinelearningtoparsechemicalmixturedescriptions AT buninbarrya usingmachinelearningtoparsechemicalmixturedescriptions |