Cargando…

Uncovering new families and folds in the natural protein universe

We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database(1). These models cover nearly all proteins that are known, including those challenging to annotate for function or putat...

Descripción completa

Detalles Bibliográficos
Autores principales: Durairaj, Janani, Waterhouse, Andrew M., Mets, Toomas, Brodiazhenko, Tetiana, Abdullah, Minhal, Studer, Gabriel, Tauriello, Gerardo, Akdel, Mehmet, Andreeva, Antonina, Bateman, Alex, Tenson, Tanel, Hauryliuk, Vasili, Schwede, Torsten, Pereira, Joana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584680/
https://www.ncbi.nlm.nih.gov/pubmed/37704037
http://dx.doi.org/10.1038/s41586-023-06622-3
_version_ 1785122791968735232
author Durairaj, Janani
Waterhouse, Andrew M.
Mets, Toomas
Brodiazhenko, Tetiana
Abdullah, Minhal
Studer, Gabriel
Tauriello, Gerardo
Akdel, Mehmet
Andreeva, Antonina
Bateman, Alex
Tenson, Tanel
Hauryliuk, Vasili
Schwede, Torsten
Pereira, Joana
author_facet Durairaj, Janani
Waterhouse, Andrew M.
Mets, Toomas
Brodiazhenko, Tetiana
Abdullah, Minhal
Studer, Gabriel
Tauriello, Gerardo
Akdel, Mehmet
Andreeva, Antonina
Bateman, Alex
Tenson, Tanel
Hauryliuk, Vasili
Schwede, Torsten
Pereira, Joana
author_sort Durairaj, Janani
collection PubMed
description We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database(1). These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this ‘dark matter’ of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4. By searching for novelties from sequence, structure and semantic perspectives, we uncovered the β-flower fold, added several protein families to Pfam database(2) and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin–antitoxin systems, TumE–TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.
format Online
Article
Text
id pubmed-10584680
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-105846802023-10-20 Uncovering new families and folds in the natural protein universe Durairaj, Janani Waterhouse, Andrew M. Mets, Toomas Brodiazhenko, Tetiana Abdullah, Minhal Studer, Gabriel Tauriello, Gerardo Akdel, Mehmet Andreeva, Antonina Bateman, Alex Tenson, Tanel Hauryliuk, Vasili Schwede, Torsten Pereira, Joana Nature Article We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database(1). These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this ‘dark matter’ of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4. By searching for novelties from sequence, structure and semantic perspectives, we uncovered the β-flower fold, added several protein families to Pfam database(2) and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin–antitoxin systems, TumE–TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology. Nature Publishing Group UK 2023-09-13 2023 /pmc/articles/PMC10584680/ /pubmed/37704037 http://dx.doi.org/10.1038/s41586-023-06622-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Durairaj, Janani
Waterhouse, Andrew M.
Mets, Toomas
Brodiazhenko, Tetiana
Abdullah, Minhal
Studer, Gabriel
Tauriello, Gerardo
Akdel, Mehmet
Andreeva, Antonina
Bateman, Alex
Tenson, Tanel
Hauryliuk, Vasili
Schwede, Torsten
Pereira, Joana
Uncovering new families and folds in the natural protein universe
title Uncovering new families and folds in the natural protein universe
title_full Uncovering new families and folds in the natural protein universe
title_fullStr Uncovering new families and folds in the natural protein universe
title_full_unstemmed Uncovering new families and folds in the natural protein universe
title_short Uncovering new families and folds in the natural protein universe
title_sort uncovering new families and folds in the natural protein universe
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584680/
https://www.ncbi.nlm.nih.gov/pubmed/37704037
http://dx.doi.org/10.1038/s41586-023-06622-3
work_keys_str_mv AT durairajjanani uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse
AT waterhouseandrewm uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse
AT metstoomas uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse
AT brodiazhenkotetiana uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse
AT abdullahminhal uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse
AT studergabriel uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse
AT tauriellogerardo uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse
AT akdelmehmet uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse
AT andreevaantonina uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse
AT batemanalex uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse
AT tensontanel uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse
AT hauryliukvasili uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse
AT schwedetorsten uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse
AT pereirajoana uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse