Cargando…

Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning

Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for uns...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Jiayi, Ma, Yong-Bei, Congdon, Charles, Brett, Bevin, Chen, Shuobing, Xu, Yaofang, Ouyang, Qi, Mao, Youdong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5546606/
https://www.ncbi.nlm.nih.gov/pubmed/28786986
http://dx.doi.org/10.1371/journal.pone.0182130
_version_ 1783255583433949184
author Wu, Jiayi
Ma, Yong-Bei
Congdon, Charles
Brett, Bevin
Chen, Shuobing
Xu, Yaofang
Ouyang, Qi
Mao, Youdong
author_facet Wu, Jiayi
Ma, Yong-Bei
Congdon, Charles
Brett, Bevin
Chen, Shuobing
Xu, Yaofang
Ouyang, Qi
Mao, Youdong
author_sort Wu, Jiayi
collection PubMed
description Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.
format Online
Article
Text
id pubmed-5546606
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55466062017-08-12 Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning Wu, Jiayi Ma, Yong-Bei Congdon, Charles Brett, Bevin Chen, Shuobing Xu, Yaofang Ouyang, Qi Mao, Youdong PLoS One Research Article Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization. Public Library of Science 2017-08-07 /pmc/articles/PMC5546606/ /pubmed/28786986 http://dx.doi.org/10.1371/journal.pone.0182130 Text en © 2017 Wu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Wu, Jiayi
Ma, Yong-Bei
Congdon, Charles
Brett, Bevin
Chen, Shuobing
Xu, Yaofang
Ouyang, Qi
Mao, Youdong
Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning
title Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning
title_full Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning
title_fullStr Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning
title_full_unstemmed Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning
title_short Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning
title_sort massively parallel unsupervised single-particle cryo-em data clustering via statistical manifold learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5546606/
https://www.ncbi.nlm.nih.gov/pubmed/28786986
http://dx.doi.org/10.1371/journal.pone.0182130
work_keys_str_mv AT wujiayi massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning
AT mayongbei massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning
AT congdoncharles massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning
AT brettbevin massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning
AT chenshuobing massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning
AT xuyaofang massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning
AT ouyangqi massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning
AT maoyoudong massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning