Cargando…

A large expert-curated cryo-EM image dataset for machine learning protein particle picking

Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structures of biological macromolecular complexes. Picking single-protein particles from cryo-EM micrographs is a crucial step in reconstructing protein structures. However, the widely used template-based particle picking...

Descripción completa

Detalles Bibliográficos
Autores principales: Dhakal, Ashwin, Gyawali, Rajan, Wang, Liguo, Cheng, Jianlin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10287764/
https://www.ncbi.nlm.nih.gov/pubmed/37349345
http://dx.doi.org/10.1038/s41597-023-02280-2
_version_ 1785061944204460032
author Dhakal, Ashwin
Gyawali, Rajan
Wang, Liguo
Cheng, Jianlin
author_facet Dhakal, Ashwin
Gyawali, Rajan
Wang, Liguo
Cheng, Jianlin
author_sort Dhakal, Ashwin
collection PubMed
description Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structures of biological macromolecular complexes. Picking single-protein particles from cryo-EM micrographs is a crucial step in reconstructing protein structures. However, the widely used template-based particle picking process is labor-intensive and time-consuming. Though machine learning and artificial intelligence (AI) based particle picking can potentially automate the process, its development is hindered by lack of large, high-quality labelled training data. To address this bottleneck, we present CryoPPP, a large, diverse, expert-curated cryo-EM image dataset for protein particle picking and analysis. It consists of labelled cryo-EM micrographs (images) of 34 representative protein datasets selected from the Electron Microscopy Public Image Archive (EMPIAR). The dataset is 2.6 terabytes and includes 9,893 high-resolution micrographs with labelled protein particle coordinates. The labelling process was rigorously validated through 2D particle class validation and 3D density map validation with the gold standard. The dataset is expected to greatly facilitate the development of both AI and classical methods for automated cryo-EM protein particle picking.
format Online
Article
Text
id pubmed-10287764
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-102877642023-06-24 A large expert-curated cryo-EM image dataset for machine learning protein particle picking Dhakal, Ashwin Gyawali, Rajan Wang, Liguo Cheng, Jianlin Sci Data Data Descriptor Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structures of biological macromolecular complexes. Picking single-protein particles from cryo-EM micrographs is a crucial step in reconstructing protein structures. However, the widely used template-based particle picking process is labor-intensive and time-consuming. Though machine learning and artificial intelligence (AI) based particle picking can potentially automate the process, its development is hindered by lack of large, high-quality labelled training data. To address this bottleneck, we present CryoPPP, a large, diverse, expert-curated cryo-EM image dataset for protein particle picking and analysis. It consists of labelled cryo-EM micrographs (images) of 34 representative protein datasets selected from the Electron Microscopy Public Image Archive (EMPIAR). The dataset is 2.6 terabytes and includes 9,893 high-resolution micrographs with labelled protein particle coordinates. The labelling process was rigorously validated through 2D particle class validation and 3D density map validation with the gold standard. The dataset is expected to greatly facilitate the development of both AI and classical methods for automated cryo-EM protein particle picking. Nature Publishing Group UK 2023-06-22 /pmc/articles/PMC10287764/ /pubmed/37349345 http://dx.doi.org/10.1038/s41597-023-02280-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Data Descriptor
Dhakal, Ashwin
Gyawali, Rajan
Wang, Liguo
Cheng, Jianlin
A large expert-curated cryo-EM image dataset for machine learning protein particle picking
title A large expert-curated cryo-EM image dataset for machine learning protein particle picking
title_full A large expert-curated cryo-EM image dataset for machine learning protein particle picking
title_fullStr A large expert-curated cryo-EM image dataset for machine learning protein particle picking
title_full_unstemmed A large expert-curated cryo-EM image dataset for machine learning protein particle picking
title_short A large expert-curated cryo-EM image dataset for machine learning protein particle picking
title_sort large expert-curated cryo-em image dataset for machine learning protein particle picking
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10287764/
https://www.ncbi.nlm.nih.gov/pubmed/37349345
http://dx.doi.org/10.1038/s41597-023-02280-2
work_keys_str_mv AT dhakalashwin alargeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking
AT gyawalirajan alargeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking
AT wangliguo alargeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking
AT chengjianlin alargeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking
AT dhakalashwin largeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking
AT gyawalirajan largeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking
AT wangliguo largeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking
AT chengjianlin largeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking