Cargando…
Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning
Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for uns...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5546606/ https://www.ncbi.nlm.nih.gov/pubmed/28786986 http://dx.doi.org/10.1371/journal.pone.0182130 |
_version_ | 1783255583433949184 |
---|---|
author | Wu, Jiayi Ma, Yong-Bei Congdon, Charles Brett, Bevin Chen, Shuobing Xu, Yaofang Ouyang, Qi Mao, Youdong |
author_facet | Wu, Jiayi Ma, Yong-Bei Congdon, Charles Brett, Bevin Chen, Shuobing Xu, Yaofang Ouyang, Qi Mao, Youdong |
author_sort | Wu, Jiayi |
collection | PubMed |
description | Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization. |
format | Online Article Text |
id | pubmed-5546606 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-55466062017-08-12 Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning Wu, Jiayi Ma, Yong-Bei Congdon, Charles Brett, Bevin Chen, Shuobing Xu, Yaofang Ouyang, Qi Mao, Youdong PLoS One Research Article Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization. Public Library of Science 2017-08-07 /pmc/articles/PMC5546606/ /pubmed/28786986 http://dx.doi.org/10.1371/journal.pone.0182130 Text en © 2017 Wu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Wu, Jiayi Ma, Yong-Bei Congdon, Charles Brett, Bevin Chen, Shuobing Xu, Yaofang Ouyang, Qi Mao, Youdong Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning |
title | Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning |
title_full | Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning |
title_fullStr | Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning |
title_full_unstemmed | Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning |
title_short | Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning |
title_sort | massively parallel unsupervised single-particle cryo-em data clustering via statistical manifold learning |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5546606/ https://www.ncbi.nlm.nih.gov/pubmed/28786986 http://dx.doi.org/10.1371/journal.pone.0182130 |
work_keys_str_mv | AT wujiayi massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning AT mayongbei massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning AT congdoncharles massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning AT brettbevin massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning AT chenshuobing massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning AT xuyaofang massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning AT ouyangqi massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning AT maoyoudong massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning |