Cargando…

Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins

Deep learning-based prediction of protein structure usually begins by constructing a multiple sequence alignment (MSA) containing homologs of the target protein. The most successful approaches combine large feature sets derived from MSAs, and considerable computational effort is spent deriving these...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kandathil, Shaun M., Greener, Joe G., Lau, Andy M., Jones, David T.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	National Academy of Sciences 2022
Materias:	Biological Sciences
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8795500/ https://www.ncbi.nlm.nih.gov/pubmed/35074909 http://dx.doi.org/10.1073/pnas.2113348119

_version_	1784641079369269248
author	Kandathil, Shaun M. Greener, Joe G. Lau, Andy M. Jones, David T.
author_facet	Kandathil, Shaun M. Greener, Joe G. Lau, Andy M. Jones, David T.
author_sort	Kandathil, Shaun M.
collection	PubMed
description	Deep learning-based prediction of protein structure usually begins by constructing a multiple sequence alignment (MSA) containing homologs of the target protein. The most successful approaches combine large feature sets derived from MSAs, and considerable computational effort is spent deriving these input features. We present a method that greatly reduces the amount of preprocessing required for a target MSA, while producing main chain coordinates as a direct output of a deep neural network. The network makes use of just three recurrent networks and a stack of residual convolutional layers, making the predictor very fast to run, and easy to install and use. Our approach constructs a directly learned representation of the sequences in an MSA, starting from a one-hot encoding of the sequences. When supplemented with an approximate precision matrix, the learned representation can be used to produce structural models of comparable or greater accuracy as compared to our original DMPfold method, while requiring less than a second to produce a typical model. This level of accuracy and speed allows very large-scale three-dimensional modeling of proteins on minimal hardware, and we demonstrate this by producing models for over 1.3 million uncharacterized regions of proteins extracted from the BFD sequence clusters. After constructing an initial set of approximate models, we select a confident subset of over 30,000 models for further refinement and analysis, revealing putative novel protein folds. We also provide updated models for over 5,000 Pfam families studied in the original DMPfold paper.
format	Online Article Text
id	pubmed-8795500
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	National Academy of Sciences
record_format	MEDLINE/PubMed
spelling	pubmed-87955002022-07-24 Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins Kandathil, Shaun M. Greener, Joe G. Lau, Andy M. Jones, David T. Proc Natl Acad Sci U S A Biological Sciences Deep learning-based prediction of protein structure usually begins by constructing a multiple sequence alignment (MSA) containing homologs of the target protein. The most successful approaches combine large feature sets derived from MSAs, and considerable computational effort is spent deriving these input features. We present a method that greatly reduces the amount of preprocessing required for a target MSA, while producing main chain coordinates as a direct output of a deep neural network. The network makes use of just three recurrent networks and a stack of residual convolutional layers, making the predictor very fast to run, and easy to install and use. Our approach constructs a directly learned representation of the sequences in an MSA, starting from a one-hot encoding of the sequences. When supplemented with an approximate precision matrix, the learned representation can be used to produce structural models of comparable or greater accuracy as compared to our original DMPfold method, while requiring less than a second to produce a typical model. This level of accuracy and speed allows very large-scale three-dimensional modeling of proteins on minimal hardware, and we demonstrate this by producing models for over 1.3 million uncharacterized regions of proteins extracted from the BFD sequence clusters. After constructing an initial set of approximate models, we select a confident subset of over 30,000 models for further refinement and analysis, revealing putative novel protein folds. We also provide updated models for over 5,000 Pfam families studied in the original DMPfold paper. National Academy of Sciences 2022-01-24 2022-01-25 /pmc/articles/PMC8795500/ /pubmed/35074909 http://dx.doi.org/10.1073/pnas.2113348119 Text en Copyright © 2022 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle	Biological Sciences Kandathil, Shaun M. Greener, Joe G. Lau, Andy M. Jones, David T. Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins
title	Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins
title_full	Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins
title_fullStr	Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins
title_full_unstemmed	Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins
title_short	Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins
title_sort	ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins
topic	Biological Sciences
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8795500/ https://www.ncbi.nlm.nih.gov/pubmed/35074909 http://dx.doi.org/10.1073/pnas.2113348119
work_keys_str_mv	AT kandathilshaunm ultrafastendtoendproteinstructurepredictionenableshighthroughputexplorationofuncharacterizedproteins AT greenerjoeg ultrafastendtoendproteinstructurepredictionenableshighthroughputexplorationofuncharacterizedproteins AT lauandym ultrafastendtoendproteinstructurepredictionenableshighthroughputexplorationofuncharacterizedproteins AT jonesdavidt ultrafastendtoendproteinstructurepredictionenableshighthroughputexplorationofuncharacterizedproteins

Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins

Ejemplares similares