Cargando…

Highly accurate protein structure prediction for the human proteome

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experi...

Descripción completa

Detalles Bibliográficos
Autores principales: Tunyasuvunakool, Kathryn, Adler, Jonas, Wu, Zachary, Green, Tim, Zielinski, Michal, Žídek, Augustin, Bridgland, Alex, Cowie, Andrew, Meyer, Clemens, Laydon, Agata, Velankar, Sameer, Kleywegt, Gerard J., Bateman, Alex, Evans, Richard, Pritzel, Alexander, Figurnov, Michael, Ronneberger, Olaf, Bates, Russ, Kohl, Simon A. A., Potapenko, Anna, Ballard, Andrew J., Romera-Paredes, Bernardino, Nikolov, Stanislav, Jain, Rishub, Clancy, Ellen, Reiman, David, Petersen, Stig, Senior, Andrew W., Kavukcuoglu, Koray, Birney, Ewan, Kohli, Pushmeet, Jumper, John, Hassabis, Demis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8387240/
https://www.ncbi.nlm.nih.gov/pubmed/34293799
http://dx.doi.org/10.1038/s41586-021-03828-1
Descripción
Sumario:Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure(1). Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold(2), at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.