Cargando…

Improving molecular representation learning with metric learning-enhanced optimal transport

Training data are usually limited or heterogeneous in many chemical and biological applications. Existing machine learning models for chemistry and materials science fail to consider generalizing beyond training domains. In this article, we develop a novel optimal transport-based algorithm termed MR...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Fang, Courty, Nicolas, Jin, Shuting, Li, Stan Z.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10140620/
https://www.ncbi.nlm.nih.gov/pubmed/37123438
http://dx.doi.org/10.1016/j.patter.2023.100714
_version_ 1785033202815991808
author Wu, Fang
Courty, Nicolas
Jin, Shuting
Li, Stan Z.
author_facet Wu, Fang
Courty, Nicolas
Jin, Shuting
Li, Stan Z.
author_sort Wu, Fang
collection PubMed
description Training data are usually limited or heterogeneous in many chemical and biological applications. Existing machine learning models for chemistry and materials science fail to consider generalizing beyond training domains. In this article, we develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems. MROT learns a continuous label of the data by measuring a new metric of domain distances and a posterior variance regularization over the transport plan to bridge the chemical domain gap. Among downstream tasks, we consider basic chemical regression tasks in unsupervised and semi-supervised settings, including chemical property prediction and materials adsorption selection. Extensive experiments show that MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances with desired properties.
format Online
Article
Text
id pubmed-10140620
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-101406202023-04-29 Improving molecular representation learning with metric learning-enhanced optimal transport Wu, Fang Courty, Nicolas Jin, Shuting Li, Stan Z. Patterns (N Y) Article Training data are usually limited or heterogeneous in many chemical and biological applications. Existing machine learning models for chemistry and materials science fail to consider generalizing beyond training domains. In this article, we develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems. MROT learns a continuous label of the data by measuring a new metric of domain distances and a posterior variance regularization over the transport plan to bridge the chemical domain gap. Among downstream tasks, we consider basic chemical regression tasks in unsupervised and semi-supervised settings, including chemical property prediction and materials adsorption selection. Extensive experiments show that MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances with desired properties. Elsevier 2023-03-29 /pmc/articles/PMC10140620/ /pubmed/37123438 http://dx.doi.org/10.1016/j.patter.2023.100714 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Wu, Fang
Courty, Nicolas
Jin, Shuting
Li, Stan Z.
Improving molecular representation learning with metric learning-enhanced optimal transport
title Improving molecular representation learning with metric learning-enhanced optimal transport
title_full Improving molecular representation learning with metric learning-enhanced optimal transport
title_fullStr Improving molecular representation learning with metric learning-enhanced optimal transport
title_full_unstemmed Improving molecular representation learning with metric learning-enhanced optimal transport
title_short Improving molecular representation learning with metric learning-enhanced optimal transport
title_sort improving molecular representation learning with metric learning-enhanced optimal transport
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10140620/
https://www.ncbi.nlm.nih.gov/pubmed/37123438
http://dx.doi.org/10.1016/j.patter.2023.100714
work_keys_str_mv AT wufang improvingmolecularrepresentationlearningwithmetriclearningenhancedoptimaltransport
AT courtynicolas improvingmolecularrepresentationlearningwithmetriclearningenhancedoptimaltransport
AT jinshuting improvingmolecularrepresentationlearningwithmetriclearningenhancedoptimaltransport
AT listanz improvingmolecularrepresentationlearningwithmetriclearningenhancedoptimaltransport