Cargando…
A large expert-curated cryo-EM image dataset for machine learning protein particle picking
Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structures of biological macromolecular complexes. Picking single-protein particles from cryo-EM micrographs is a crucial step in reconstructing protein structures. However, the widely used template-based particle picking...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10287764/ https://www.ncbi.nlm.nih.gov/pubmed/37349345 http://dx.doi.org/10.1038/s41597-023-02280-2 |
_version_ | 1785061944204460032 |
---|---|
author | Dhakal, Ashwin Gyawali, Rajan Wang, Liguo Cheng, Jianlin |
author_facet | Dhakal, Ashwin Gyawali, Rajan Wang, Liguo Cheng, Jianlin |
author_sort | Dhakal, Ashwin |
collection | PubMed |
description | Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structures of biological macromolecular complexes. Picking single-protein particles from cryo-EM micrographs is a crucial step in reconstructing protein structures. However, the widely used template-based particle picking process is labor-intensive and time-consuming. Though machine learning and artificial intelligence (AI) based particle picking can potentially automate the process, its development is hindered by lack of large, high-quality labelled training data. To address this bottleneck, we present CryoPPP, a large, diverse, expert-curated cryo-EM image dataset for protein particle picking and analysis. It consists of labelled cryo-EM micrographs (images) of 34 representative protein datasets selected from the Electron Microscopy Public Image Archive (EMPIAR). The dataset is 2.6 terabytes and includes 9,893 high-resolution micrographs with labelled protein particle coordinates. The labelling process was rigorously validated through 2D particle class validation and 3D density map validation with the gold standard. The dataset is expected to greatly facilitate the development of both AI and classical methods for automated cryo-EM protein particle picking. |
format | Online Article Text |
id | pubmed-10287764 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-102877642023-06-24 A large expert-curated cryo-EM image dataset for machine learning protein particle picking Dhakal, Ashwin Gyawali, Rajan Wang, Liguo Cheng, Jianlin Sci Data Data Descriptor Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structures of biological macromolecular complexes. Picking single-protein particles from cryo-EM micrographs is a crucial step in reconstructing protein structures. However, the widely used template-based particle picking process is labor-intensive and time-consuming. Though machine learning and artificial intelligence (AI) based particle picking can potentially automate the process, its development is hindered by lack of large, high-quality labelled training data. To address this bottleneck, we present CryoPPP, a large, diverse, expert-curated cryo-EM image dataset for protein particle picking and analysis. It consists of labelled cryo-EM micrographs (images) of 34 representative protein datasets selected from the Electron Microscopy Public Image Archive (EMPIAR). The dataset is 2.6 terabytes and includes 9,893 high-resolution micrographs with labelled protein particle coordinates. The labelling process was rigorously validated through 2D particle class validation and 3D density map validation with the gold standard. The dataset is expected to greatly facilitate the development of both AI and classical methods for automated cryo-EM protein particle picking. Nature Publishing Group UK 2023-06-22 /pmc/articles/PMC10287764/ /pubmed/37349345 http://dx.doi.org/10.1038/s41597-023-02280-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Data Descriptor Dhakal, Ashwin Gyawali, Rajan Wang, Liguo Cheng, Jianlin A large expert-curated cryo-EM image dataset for machine learning protein particle picking |
title | A large expert-curated cryo-EM image dataset for machine learning protein particle picking |
title_full | A large expert-curated cryo-EM image dataset for machine learning protein particle picking |
title_fullStr | A large expert-curated cryo-EM image dataset for machine learning protein particle picking |
title_full_unstemmed | A large expert-curated cryo-EM image dataset for machine learning protein particle picking |
title_short | A large expert-curated cryo-EM image dataset for machine learning protein particle picking |
title_sort | large expert-curated cryo-em image dataset for machine learning protein particle picking |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10287764/ https://www.ncbi.nlm.nih.gov/pubmed/37349345 http://dx.doi.org/10.1038/s41597-023-02280-2 |
work_keys_str_mv | AT dhakalashwin alargeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking AT gyawalirajan alargeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking AT wangliguo alargeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking AT chengjianlin alargeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking AT dhakalashwin largeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking AT gyawalirajan largeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking AT wangliguo largeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking AT chengjianlin largeexpertcuratedcryoemimagedatasetformachinelearningproteinparticlepicking |