Cargando…

The structural coverage of the human proteome before and after AlphaFold

The protein structure field is experiencing a revolution. From the increased throughput of techniques to determine experimental structures, to developments such as cryo-EM that allow us to find the structures of large protein complexes or, more recently, the development of artificial intelligence to...

Descripción completa

Detalles Bibliográficos
Autores principales: Porta-Pardo, Eduard, Ruiz-Serra, Victoria, Valentini, Samuel, Valencia, Alfonso
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8812986/
https://www.ncbi.nlm.nih.gov/pubmed/35073311
http://dx.doi.org/10.1371/journal.pcbi.1009818
_version_ 1784644775614349312
author Porta-Pardo, Eduard
Ruiz-Serra, Victoria
Valentini, Samuel
Valencia, Alfonso
author_facet Porta-Pardo, Eduard
Ruiz-Serra, Victoria
Valentini, Samuel
Valencia, Alfonso
author_sort Porta-Pardo, Eduard
collection PubMed
description The protein structure field is experiencing a revolution. From the increased throughput of techniques to determine experimental structures, to developments such as cryo-EM that allow us to find the structures of large protein complexes or, more recently, the development of artificial intelligence tools, such as AlphaFold, that can predict with high accuracy the folding of proteins for which the availability of homology templates is limited. Here we quantify the effect of the recently released AlphaFold database of protein structural models in our knowledge on human proteins. Our results indicate that our current baseline for structural coverage of 48%, considering experimentally-derived or template-based homology models, elevates up to 76% when including AlphaFold predictions. At the same time the fraction of dark proteome is reduced from 26% to just 10% when AlphaFold models are considered. Furthermore, although the coverage of disease-associated genes and mutations was near complete before AlphaFold release (69% of Clinvar pathogenic mutations and 88% of oncogenic mutations), AlphaFold models still provide an additional coverage of 3% to 13% of these critically important sets of biomedical genes and mutations. Finally, we show how the contribution of AlphaFold models to the structural coverage of non-human organisms, including important pathogenic bacteria, is significantly larger than that of the human proteome. Overall, our results show that the sequence-structure gap of human proteins has almost disappeared, an outstanding success of direct consequences for the knowledge on the human genome and the derived medical applications.
format Online
Article
Text
id pubmed-8812986
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-88129862022-02-04 The structural coverage of the human proteome before and after AlphaFold Porta-Pardo, Eduard Ruiz-Serra, Victoria Valentini, Samuel Valencia, Alfonso PLoS Comput Biol Research Article The protein structure field is experiencing a revolution. From the increased throughput of techniques to determine experimental structures, to developments such as cryo-EM that allow us to find the structures of large protein complexes or, more recently, the development of artificial intelligence tools, such as AlphaFold, that can predict with high accuracy the folding of proteins for which the availability of homology templates is limited. Here we quantify the effect of the recently released AlphaFold database of protein structural models in our knowledge on human proteins. Our results indicate that our current baseline for structural coverage of 48%, considering experimentally-derived or template-based homology models, elevates up to 76% when including AlphaFold predictions. At the same time the fraction of dark proteome is reduced from 26% to just 10% when AlphaFold models are considered. Furthermore, although the coverage of disease-associated genes and mutations was near complete before AlphaFold release (69% of Clinvar pathogenic mutations and 88% of oncogenic mutations), AlphaFold models still provide an additional coverage of 3% to 13% of these critically important sets of biomedical genes and mutations. Finally, we show how the contribution of AlphaFold models to the structural coverage of non-human organisms, including important pathogenic bacteria, is significantly larger than that of the human proteome. Overall, our results show that the sequence-structure gap of human proteins has almost disappeared, an outstanding success of direct consequences for the knowledge on the human genome and the derived medical applications. Public Library of Science 2022-01-24 /pmc/articles/PMC8812986/ /pubmed/35073311 http://dx.doi.org/10.1371/journal.pcbi.1009818 Text en © 2022 Porta-Pardo et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Porta-Pardo, Eduard
Ruiz-Serra, Victoria
Valentini, Samuel
Valencia, Alfonso
The structural coverage of the human proteome before and after AlphaFold
title The structural coverage of the human proteome before and after AlphaFold
title_full The structural coverage of the human proteome before and after AlphaFold
title_fullStr The structural coverage of the human proteome before and after AlphaFold
title_full_unstemmed The structural coverage of the human proteome before and after AlphaFold
title_short The structural coverage of the human proteome before and after AlphaFold
title_sort structural coverage of the human proteome before and after alphafold
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8812986/
https://www.ncbi.nlm.nih.gov/pubmed/35073311
http://dx.doi.org/10.1371/journal.pcbi.1009818
work_keys_str_mv AT portapardoeduard thestructuralcoverageofthehumanproteomebeforeandafteralphafold
AT ruizserravictoria thestructuralcoverageofthehumanproteomebeforeandafteralphafold
AT valentinisamuel thestructuralcoverageofthehumanproteomebeforeandafteralphafold
AT valenciaalfonso thestructuralcoverageofthehumanproteomebeforeandafteralphafold
AT portapardoeduard structuralcoverageofthehumanproteomebeforeandafteralphafold
AT ruizserravictoria structuralcoverageofthehumanproteomebeforeandafteralphafold
AT valentinisamuel structuralcoverageofthehumanproteomebeforeandafteralphafold
AT valenciaalfonso structuralcoverageofthehumanproteomebeforeandafteralphafold