Cargando…
Uncovering new families and folds in the natural protein universe
We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database(1). These models cover nearly all proteins that are known, including those challenging to annotate for function or putat...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584680/ https://www.ncbi.nlm.nih.gov/pubmed/37704037 http://dx.doi.org/10.1038/s41586-023-06622-3 |
_version_ | 1785122791968735232 |
---|---|
author | Durairaj, Janani Waterhouse, Andrew M. Mets, Toomas Brodiazhenko, Tetiana Abdullah, Minhal Studer, Gabriel Tauriello, Gerardo Akdel, Mehmet Andreeva, Antonina Bateman, Alex Tenson, Tanel Hauryliuk, Vasili Schwede, Torsten Pereira, Joana |
author_facet | Durairaj, Janani Waterhouse, Andrew M. Mets, Toomas Brodiazhenko, Tetiana Abdullah, Minhal Studer, Gabriel Tauriello, Gerardo Akdel, Mehmet Andreeva, Antonina Bateman, Alex Tenson, Tanel Hauryliuk, Vasili Schwede, Torsten Pereira, Joana |
author_sort | Durairaj, Janani |
collection | PubMed |
description | We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database(1). These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this ‘dark matter’ of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4. By searching for novelties from sequence, structure and semantic perspectives, we uncovered the β-flower fold, added several protein families to Pfam database(2) and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin–antitoxin systems, TumE–TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology. |
format | Online Article Text |
id | pubmed-10584680 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-105846802023-10-20 Uncovering new families and folds in the natural protein universe Durairaj, Janani Waterhouse, Andrew M. Mets, Toomas Brodiazhenko, Tetiana Abdullah, Minhal Studer, Gabriel Tauriello, Gerardo Akdel, Mehmet Andreeva, Antonina Bateman, Alex Tenson, Tanel Hauryliuk, Vasili Schwede, Torsten Pereira, Joana Nature Article We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database(1). These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this ‘dark matter’ of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4. By searching for novelties from sequence, structure and semantic perspectives, we uncovered the β-flower fold, added several protein families to Pfam database(2) and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin–antitoxin systems, TumE–TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology. Nature Publishing Group UK 2023-09-13 2023 /pmc/articles/PMC10584680/ /pubmed/37704037 http://dx.doi.org/10.1038/s41586-023-06622-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Durairaj, Janani Waterhouse, Andrew M. Mets, Toomas Brodiazhenko, Tetiana Abdullah, Minhal Studer, Gabriel Tauriello, Gerardo Akdel, Mehmet Andreeva, Antonina Bateman, Alex Tenson, Tanel Hauryliuk, Vasili Schwede, Torsten Pereira, Joana Uncovering new families and folds in the natural protein universe |
title | Uncovering new families and folds in the natural protein universe |
title_full | Uncovering new families and folds in the natural protein universe |
title_fullStr | Uncovering new families and folds in the natural protein universe |
title_full_unstemmed | Uncovering new families and folds in the natural protein universe |
title_short | Uncovering new families and folds in the natural protein universe |
title_sort | uncovering new families and folds in the natural protein universe |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584680/ https://www.ncbi.nlm.nih.gov/pubmed/37704037 http://dx.doi.org/10.1038/s41586-023-06622-3 |
work_keys_str_mv | AT durairajjanani uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse AT waterhouseandrewm uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse AT metstoomas uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse AT brodiazhenkotetiana uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse AT abdullahminhal uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse AT studergabriel uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse AT tauriellogerardo uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse AT akdelmehmet uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse AT andreevaantonina uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse AT batemanalex uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse AT tensontanel uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse AT hauryliukvasili uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse AT schwedetorsten uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse AT pereirajoana uncoveringnewfamiliesandfoldsinthenaturalproteinuniverse |