Cargando…
Cryo2Struct : A Large Labeled Cryo-EM Density Map Dataset for AI-based Reconstruction of Protein Structures
The advent of single-particle cryogenic electron microscopy (cryo-EM) has brought forth a new era of structural biology, enabling the routine determination of large biological protein complexes and assemblies at atomic resolution. The high-resolution structures of protein complexes and assemblies si...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10312718/ https://www.ncbi.nlm.nih.gov/pubmed/37398020 http://dx.doi.org/10.1101/2023.06.14.545024 |
_version_ | 1785066975082315776 |
---|---|
author | Giri, Nabin Cheng, Jianlin |
author_facet | Giri, Nabin Cheng, Jianlin |
author_sort | Giri, Nabin |
collection | PubMed |
description | The advent of single-particle cryogenic electron microscopy (cryo-EM) has brought forth a new era of structural biology, enabling the routine determination of large biological protein complexes and assemblies at atomic resolution. The high-resolution structures of protein complexes and assemblies significantly expedite biomedical research and drug discovery. However, automatically and accurately reconstructing protein structures from high-resolution density maps generated by cryo-EM is still time-consuming and challenging when template structures for the protein chains in a target protein complex are unavailable. Artificial intelligence (AI) methods such as deep learning trained on limited amounts of labeled cryo-EM density maps generate unstable reconstructions. To address this issue, we created a dataset called Cryo2Struct consisting of 7,600 preprocessed cryo-EM density maps whose voxels are labelled according to their corresponding known protein structures for training and testing AI methods to infer protein structures from density maps. It is larger and has better quality than any existing, publicly available dataset. We trained and tested deep learning models on Cryo2Struct to make sure it is ready for the large-scale development of AI methods for reconstructing protein structures from cryo-EM density maps. The source code, data and instructions to reproduce our results are freely available at https://github.com/BioinfoMachineLearning/cryo2struct. |
format | Online Article Text |
id | pubmed-10312718 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-103127182023-07-01 Cryo2Struct : A Large Labeled Cryo-EM Density Map Dataset for AI-based Reconstruction of Protein Structures Giri, Nabin Cheng, Jianlin bioRxiv Article The advent of single-particle cryogenic electron microscopy (cryo-EM) has brought forth a new era of structural biology, enabling the routine determination of large biological protein complexes and assemblies at atomic resolution. The high-resolution structures of protein complexes and assemblies significantly expedite biomedical research and drug discovery. However, automatically and accurately reconstructing protein structures from high-resolution density maps generated by cryo-EM is still time-consuming and challenging when template structures for the protein chains in a target protein complex are unavailable. Artificial intelligence (AI) methods such as deep learning trained on limited amounts of labeled cryo-EM density maps generate unstable reconstructions. To address this issue, we created a dataset called Cryo2Struct consisting of 7,600 preprocessed cryo-EM density maps whose voxels are labelled according to their corresponding known protein structures for training and testing AI methods to infer protein structures from density maps. It is larger and has better quality than any existing, publicly available dataset. We trained and tested deep learning models on Cryo2Struct to make sure it is ready for the large-scale development of AI methods for reconstructing protein structures from cryo-EM density maps. The source code, data and instructions to reproduce our results are freely available at https://github.com/BioinfoMachineLearning/cryo2struct. Cold Spring Harbor Laboratory 2023-06-15 /pmc/articles/PMC10312718/ /pubmed/37398020 http://dx.doi.org/10.1101/2023.06.14.545024 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Giri, Nabin Cheng, Jianlin Cryo2Struct : A Large Labeled Cryo-EM Density Map Dataset for AI-based Reconstruction of Protein Structures |
title | Cryo2Struct : A Large Labeled Cryo-EM Density Map Dataset for AI-based Reconstruction of Protein Structures |
title_full | Cryo2Struct : A Large Labeled Cryo-EM Density Map Dataset for AI-based Reconstruction of Protein Structures |
title_fullStr | Cryo2Struct : A Large Labeled Cryo-EM Density Map Dataset for AI-based Reconstruction of Protein Structures |
title_full_unstemmed | Cryo2Struct : A Large Labeled Cryo-EM Density Map Dataset for AI-based Reconstruction of Protein Structures |
title_short | Cryo2Struct : A Large Labeled Cryo-EM Density Map Dataset for AI-based Reconstruction of Protein Structures |
title_sort | cryo2struct : a large labeled cryo-em density map dataset for ai-based reconstruction of protein structures |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10312718/ https://www.ncbi.nlm.nih.gov/pubmed/37398020 http://dx.doi.org/10.1101/2023.06.14.545024 |
work_keys_str_mv | AT girinabin cryo2structalargelabeledcryoemdensitymapdatasetforaibasedreconstructionofproteinstructures AT chengjianlin cryo2structalargelabeledcryoemdensitymapdatasetforaibasedreconstructionofproteinstructures |