Cargando…

findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM

Although experimental protein-structure determination usually targets known proteins, chains of unknown sequence are often encountered. They can be purified from natural sources, appear as an unexpected fragment of a well characterized protein or appear as a contaminant. Regardless of the source of...

Descripción completa

Detalles Bibliográficos
Autores principales: Chojnowski, Grzegorz, Simpkin, Adam J., Leonardo, Diego A., Seifert-Davila, Wolfram, Vivas-Ruiz, Dan E., Keegan, Ronan M., Rigden, Daniel J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: International Union of Crystallography 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8733886/
https://www.ncbi.nlm.nih.gov/pubmed/35059213
http://dx.doi.org/10.1107/S2052252521011088
_version_ 1784627896646631424
author Chojnowski, Grzegorz
Simpkin, Adam J.
Leonardo, Diego A.
Seifert-Davila, Wolfram
Vivas-Ruiz, Dan E.
Keegan, Ronan M.
Rigden, Daniel J.
author_facet Chojnowski, Grzegorz
Simpkin, Adam J.
Leonardo, Diego A.
Seifert-Davila, Wolfram
Vivas-Ruiz, Dan E.
Keegan, Ronan M.
Rigden, Daniel J.
author_sort Chojnowski, Grzegorz
collection PubMed
description Although experimental protein-structure determination usually targets known proteins, chains of unknown sequence are often encountered. They can be purified from natural sources, appear as an unexpected fragment of a well characterized protein or appear as a contaminant. Regardless of the source of the problem, the unknown protein always requires characterization. Here, an automated pipeline is presented for the identification of protein sequences from cryo-EM reconstructions and crystallographic data. The method’s application to characterize the crystal structure of an unknown protein purified from a snake venom is presented. It is also shown that the approach can be successfully applied to the identification of protein sequences and validation of sequence assignments in cryo-EM protein structures.
format Online
Article
Text
id pubmed-8733886
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher International Union of Crystallography
record_format MEDLINE/PubMed
spelling pubmed-87338862022-01-19 findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM Chojnowski, Grzegorz Simpkin, Adam J. Leonardo, Diego A. Seifert-Davila, Wolfram Vivas-Ruiz, Dan E. Keegan, Ronan M. Rigden, Daniel J. IUCrJ Research Papers Although experimental protein-structure determination usually targets known proteins, chains of unknown sequence are often encountered. They can be purified from natural sources, appear as an unexpected fragment of a well characterized protein or appear as a contaminant. Regardless of the source of the problem, the unknown protein always requires characterization. Here, an automated pipeline is presented for the identification of protein sequences from cryo-EM reconstructions and crystallographic data. The method’s application to characterize the crystal structure of an unknown protein purified from a snake venom is presented. It is also shown that the approach can be successfully applied to the identification of protein sequences and validation of sequence assignments in cryo-EM protein structures. International Union of Crystallography 2021-12-01 /pmc/articles/PMC8733886/ /pubmed/35059213 http://dx.doi.org/10.1107/S2052252521011088 Text en © Grzegorz Chojnowski et al. 2022 https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.
spellingShingle Research Papers
Chojnowski, Grzegorz
Simpkin, Adam J.
Leonardo, Diego A.
Seifert-Davila, Wolfram
Vivas-Ruiz, Dan E.
Keegan, Ronan M.
Rigden, Daniel J.
findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM
title findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM
title_full findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM
title_fullStr findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM
title_full_unstemmed findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM
title_short findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM
title_sort findmysequence: a neural-network-based approach for identification of unknown proteins in x-ray crystallography and cryo-em
topic Research Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8733886/
https://www.ncbi.nlm.nih.gov/pubmed/35059213
http://dx.doi.org/10.1107/S2052252521011088
work_keys_str_mv AT chojnowskigrzegorz findmysequenceaneuralnetworkbasedapproachforidentificationofunknownproteinsinxraycrystallographyandcryoem
AT simpkinadamj findmysequenceaneuralnetworkbasedapproachforidentificationofunknownproteinsinxraycrystallographyandcryoem
AT leonardodiegoa findmysequenceaneuralnetworkbasedapproachforidentificationofunknownproteinsinxraycrystallographyandcryoem
AT seifertdavilawolfram findmysequenceaneuralnetworkbasedapproachforidentificationofunknownproteinsinxraycrystallographyandcryoem
AT vivasruizdane findmysequenceaneuralnetworkbasedapproachforidentificationofunknownproteinsinxraycrystallographyandcryoem
AT keeganronanm findmysequenceaneuralnetworkbasedapproachforidentificationofunknownproteinsinxraycrystallographyandcryoem
AT rigdendanielj findmysequenceaneuralnetworkbasedapproachforidentificationofunknownproteinsinxraycrystallographyandcryoem