Cargando…
Improving molecular representation learning with metric learning-enhanced optimal transport
Training data are usually limited or heterogeneous in many chemical and biological applications. Existing machine learning models for chemistry and materials science fail to consider generalizing beyond training domains. In this article, we develop a novel optimal transport-based algorithm termed MR...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10140620/ https://www.ncbi.nlm.nih.gov/pubmed/37123438 http://dx.doi.org/10.1016/j.patter.2023.100714 |
_version_ | 1785033202815991808 |
---|---|
author | Wu, Fang Courty, Nicolas Jin, Shuting Li, Stan Z. |
author_facet | Wu, Fang Courty, Nicolas Jin, Shuting Li, Stan Z. |
author_sort | Wu, Fang |
collection | PubMed |
description | Training data are usually limited or heterogeneous in many chemical and biological applications. Existing machine learning models for chemistry and materials science fail to consider generalizing beyond training domains. In this article, we develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems. MROT learns a continuous label of the data by measuring a new metric of domain distances and a posterior variance regularization over the transport plan to bridge the chemical domain gap. Among downstream tasks, we consider basic chemical regression tasks in unsupervised and semi-supervised settings, including chemical property prediction and materials adsorption selection. Extensive experiments show that MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances with desired properties. |
format | Online Article Text |
id | pubmed-10140620 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-101406202023-04-29 Improving molecular representation learning with metric learning-enhanced optimal transport Wu, Fang Courty, Nicolas Jin, Shuting Li, Stan Z. Patterns (N Y) Article Training data are usually limited or heterogeneous in many chemical and biological applications. Existing machine learning models for chemistry and materials science fail to consider generalizing beyond training domains. In this article, we develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems. MROT learns a continuous label of the data by measuring a new metric of domain distances and a posterior variance regularization over the transport plan to bridge the chemical domain gap. Among downstream tasks, we consider basic chemical regression tasks in unsupervised and semi-supervised settings, including chemical property prediction and materials adsorption selection. Extensive experiments show that MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances with desired properties. Elsevier 2023-03-29 /pmc/articles/PMC10140620/ /pubmed/37123438 http://dx.doi.org/10.1016/j.patter.2023.100714 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Wu, Fang Courty, Nicolas Jin, Shuting Li, Stan Z. Improving molecular representation learning with metric learning-enhanced optimal transport |
title | Improving molecular representation learning with metric learning-enhanced optimal transport |
title_full | Improving molecular representation learning with metric learning-enhanced optimal transport |
title_fullStr | Improving molecular representation learning with metric learning-enhanced optimal transport |
title_full_unstemmed | Improving molecular representation learning with metric learning-enhanced optimal transport |
title_short | Improving molecular representation learning with metric learning-enhanced optimal transport |
title_sort | improving molecular representation learning with metric learning-enhanced optimal transport |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10140620/ https://www.ncbi.nlm.nih.gov/pubmed/37123438 http://dx.doi.org/10.1016/j.patter.2023.100714 |
work_keys_str_mv | AT wufang improvingmolecularrepresentationlearningwithmetriclearningenhancedoptimaltransport AT courtynicolas improvingmolecularrepresentationlearningwithmetriclearningenhancedoptimaltransport AT jinshuting improvingmolecularrepresentationlearningwithmetriclearningenhancedoptimaltransport AT listanz improvingmolecularrepresentationlearningwithmetriclearningenhancedoptimaltransport |