Cargando…

FSCC: Few-Shot Learning for Macromolecule Classification Based on Contrastive Learning and Distribution Calibration in Cryo-Electron Tomography

Cryo-electron tomography (Cryo-ET) is an emerging technology for three-dimensional (3D) visualization of macromolecular structures in the near-native state. To recover structures of macromolecules, millions of diverse macromolecules captured in tomograms should be accurately classified into structur...

Descripción completa

Detalles Bibliográficos
Autores principales: Gao, Shan, Zeng, Xiangrui, Xu, Min, Zhang, Fa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9294403/
https://www.ncbi.nlm.nih.gov/pubmed/35865006
http://dx.doi.org/10.3389/fmolb.2022.931949
_version_ 1784749846193766400
author Gao, Shan
Zeng, Xiangrui
Xu, Min
Zhang, Fa
author_facet Gao, Shan
Zeng, Xiangrui
Xu, Min
Zhang, Fa
author_sort Gao, Shan
collection PubMed
description Cryo-electron tomography (Cryo-ET) is an emerging technology for three-dimensional (3D) visualization of macromolecular structures in the near-native state. To recover structures of macromolecules, millions of diverse macromolecules captured in tomograms should be accurately classified into structurally homogeneous subsets. Although existing supervised deep learning–based methods have improved classification accuracy, such trained models have limited ability to classify novel macromolecules that are unseen in the training stage. To adapt the trained model to the macromolecule classification of a novel class, massive labeled macromolecules of the novel class are needed. However, data labeling is very time-consuming and labor-intensive. In this work, we propose a novel few-shot learning method for the classification of novel macromolecules (named FSCC). A two-stage training strategy is designed in FSCC to enhance the generalization ability of the model to novel macromolecules. First, FSCC uses contrastive learning to pre-train the model on a sufficient number of labeled macromolecules. Second, FSCC uses distribution calibration to re-train the classifier, enabling the model to classify macromolecules of novel classes (unseen class in the pre-training). Distribution calibration transfers learned knowledge in the pre-training stage to novel macromolecules with limited labeled macromolecules of novel class. Experiments were performed on both synthetic and real datasets. On the synthetic datasets, compared with the state-of-the-art (SOTA) method based on supervised deep learning, FSCC achieves competitive performance. To achieve such performance, FSCC only needs five labeled macromolecules per novel class. However, the SOTA method needs 1100 ∼ 1500 labeled macromolecules per novel class. On the real datasets, FSCC improves the accuracy by 5% ∼ 16% when compared to the baseline model. These demonstrate good generalization ability of contrastive learning and calibration distribution to classify novel macromolecules with very few labeled macromolecules.
format Online
Article
Text
id pubmed-9294403
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-92944032022-07-20 FSCC: Few-Shot Learning for Macromolecule Classification Based on Contrastive Learning and Distribution Calibration in Cryo-Electron Tomography Gao, Shan Zeng, Xiangrui Xu, Min Zhang, Fa Front Mol Biosci Molecular Biosciences Cryo-electron tomography (Cryo-ET) is an emerging technology for three-dimensional (3D) visualization of macromolecular structures in the near-native state. To recover structures of macromolecules, millions of diverse macromolecules captured in tomograms should be accurately classified into structurally homogeneous subsets. Although existing supervised deep learning–based methods have improved classification accuracy, such trained models have limited ability to classify novel macromolecules that are unseen in the training stage. To adapt the trained model to the macromolecule classification of a novel class, massive labeled macromolecules of the novel class are needed. However, data labeling is very time-consuming and labor-intensive. In this work, we propose a novel few-shot learning method for the classification of novel macromolecules (named FSCC). A two-stage training strategy is designed in FSCC to enhance the generalization ability of the model to novel macromolecules. First, FSCC uses contrastive learning to pre-train the model on a sufficient number of labeled macromolecules. Second, FSCC uses distribution calibration to re-train the classifier, enabling the model to classify macromolecules of novel classes (unseen class in the pre-training). Distribution calibration transfers learned knowledge in the pre-training stage to novel macromolecules with limited labeled macromolecules of novel class. Experiments were performed on both synthetic and real datasets. On the synthetic datasets, compared with the state-of-the-art (SOTA) method based on supervised deep learning, FSCC achieves competitive performance. To achieve such performance, FSCC only needs five labeled macromolecules per novel class. However, the SOTA method needs 1100 ∼ 1500 labeled macromolecules per novel class. On the real datasets, FSCC improves the accuracy by 5% ∼ 16% when compared to the baseline model. These demonstrate good generalization ability of contrastive learning and calibration distribution to classify novel macromolecules with very few labeled macromolecules. Frontiers Media S.A. 2022-07-05 /pmc/articles/PMC9294403/ /pubmed/35865006 http://dx.doi.org/10.3389/fmolb.2022.931949 Text en Copyright © 2022 Gao, Zeng, Xu and Zhang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Molecular Biosciences
Gao, Shan
Zeng, Xiangrui
Xu, Min
Zhang, Fa
FSCC: Few-Shot Learning for Macromolecule Classification Based on Contrastive Learning and Distribution Calibration in Cryo-Electron Tomography
title FSCC: Few-Shot Learning for Macromolecule Classification Based on Contrastive Learning and Distribution Calibration in Cryo-Electron Tomography
title_full FSCC: Few-Shot Learning for Macromolecule Classification Based on Contrastive Learning and Distribution Calibration in Cryo-Electron Tomography
title_fullStr FSCC: Few-Shot Learning for Macromolecule Classification Based on Contrastive Learning and Distribution Calibration in Cryo-Electron Tomography
title_full_unstemmed FSCC: Few-Shot Learning for Macromolecule Classification Based on Contrastive Learning and Distribution Calibration in Cryo-Electron Tomography
title_short FSCC: Few-Shot Learning for Macromolecule Classification Based on Contrastive Learning and Distribution Calibration in Cryo-Electron Tomography
title_sort fscc: few-shot learning for macromolecule classification based on contrastive learning and distribution calibration in cryo-electron tomography
topic Molecular Biosciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9294403/
https://www.ncbi.nlm.nih.gov/pubmed/35865006
http://dx.doi.org/10.3389/fmolb.2022.931949
work_keys_str_mv AT gaoshan fsccfewshotlearningformacromoleculeclassificationbasedoncontrastivelearninganddistributioncalibrationincryoelectrontomography
AT zengxiangrui fsccfewshotlearningformacromoleculeclassificationbasedoncontrastivelearninganddistributioncalibrationincryoelectrontomography
AT xumin fsccfewshotlearningformacromoleculeclassificationbasedoncontrastivelearninganddistributioncalibrationincryoelectrontomography
AT zhangfa fsccfewshotlearningformacromoleculeclassificationbasedoncontrastivelearninganddistributioncalibrationincryoelectrontomography