Cargando…

Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science

Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labeled data needed to train the model. This poses severe challenges in data-scar...

Descripción completa

Detalles Bibliográficos
Autores principales: Loh, Charlotte, Christensen, Thomas, Dangovski, Rumen, Kim, Samuel, Soljačić, Marin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9304370/
https://www.ncbi.nlm.nih.gov/pubmed/35864122
http://dx.doi.org/10.1038/s41467-022-31915-y
_version_ 1784752088257921024
author Loh, Charlotte
Christensen, Thomas
Dangovski, Rumen
Kim, Samuel
Soljačić, Marin
author_facet Loh, Charlotte
Christensen, Thomas
Dangovski, Rumen
Kim, Samuel
Soljačić, Marin
author_sort Loh, Charlotte
collection PubMed
description Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labeled data needed to train the model. This poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Noting that problems in natural sciences often benefit from easily obtainable auxiliary information sources, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three inexpensive and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: abundant unlabeled data, prior knowledge of symmetries or invariances, and surrogate data obtained at near-zero cost. We demonstrate SIB-CL’s effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrödinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies.
format Online
Article
Text
id pubmed-9304370
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-93043702022-07-23 Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science Loh, Charlotte Christensen, Thomas Dangovski, Rumen Kim, Samuel Soljačić, Marin Nat Commun Article Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labeled data needed to train the model. This poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Noting that problems in natural sciences often benefit from easily obtainable auxiliary information sources, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three inexpensive and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: abundant unlabeled data, prior knowledge of symmetries or invariances, and surrogate data obtained at near-zero cost. We demonstrate SIB-CL’s effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrödinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies. Nature Publishing Group UK 2022-07-21 /pmc/articles/PMC9304370/ /pubmed/35864122 http://dx.doi.org/10.1038/s41467-022-31915-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Loh, Charlotte
Christensen, Thomas
Dangovski, Rumen
Kim, Samuel
Soljačić, Marin
Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science
title Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science
title_full Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science
title_fullStr Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science
title_full_unstemmed Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science
title_short Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science
title_sort surrogate- and invariance-boosted contrastive learning for data-scarce applications in science
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9304370/
https://www.ncbi.nlm.nih.gov/pubmed/35864122
http://dx.doi.org/10.1038/s41467-022-31915-y
work_keys_str_mv AT lohcharlotte surrogateandinvarianceboostedcontrastivelearningfordatascarceapplicationsinscience
AT christensenthomas surrogateandinvarianceboostedcontrastivelearningfordatascarceapplicationsinscience
AT dangovskirumen surrogateandinvarianceboostedcontrastivelearningfordatascarceapplicationsinscience
AT kimsamuel surrogateandinvarianceboostedcontrastivelearningfordatascarceapplicationsinscience
AT soljacicmarin surrogateandinvarianceboostedcontrastivelearningfordatascarceapplicationsinscience