Cargando…

Transfer learning: The key to functionally annotate the protein universe

The automatic annotation of the protein universe is still an unresolved challenge. Today, there are 229,149,489 entries in the UniProtKB database, but only 0.25% of them have been functionally annotated. This manual process integrates knowledge from the protein families database Pfam, annotating fam...

Descripción completa

Detalles Bibliográficos
Autores principales: Bugnon, Leandro A., Fenoy, Emilio, Edera, Alejandro A., Raad, Jonathan, Stegmayer, Georgina, Milone, Diego H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9982298/
https://www.ncbi.nlm.nih.gov/pubmed/36873903
http://dx.doi.org/10.1016/j.patter.2023.100691
_version_ 1784900300741541888
author Bugnon, Leandro A.
Fenoy, Emilio
Edera, Alejandro A.
Raad, Jonathan
Stegmayer, Georgina
Milone, Diego H.
author_facet Bugnon, Leandro A.
Fenoy, Emilio
Edera, Alejandro A.
Raad, Jonathan
Stegmayer, Georgina
Milone, Diego H.
author_sort Bugnon, Leandro A.
collection PubMed
description The automatic annotation of the protein universe is still an unresolved challenge. Today, there are 229,149,489 entries in the UniProtKB database, but only 0.25% of them have been functionally annotated. This manual process integrates knowledge from the protein families database Pfam, annotating family domains using sequence alignments and hidden Markov models. This approach has grown the Pfam annotations at a low rate in the last years. Recently, deep learning models appeared with the capability of learning evolutionary patterns from unaligned protein sequences. However, this requires large-scale data, while many families contain just a few sequences. Here, we contend this limitation can be overcome by transfer learning, exploiting the full potential of self-supervised learning on large unannotated data and then supervised learning on a small labeled dataset. We show results where errors in protein family prediction can be reduced by 55% with respect to standard methods.
format Online
Article
Text
id pubmed-9982298
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-99822982023-03-04 Transfer learning: The key to functionally annotate the protein universe Bugnon, Leandro A. Fenoy, Emilio Edera, Alejandro A. Raad, Jonathan Stegmayer, Georgina Milone, Diego H. Patterns (N Y) Opinion The automatic annotation of the protein universe is still an unresolved challenge. Today, there are 229,149,489 entries in the UniProtKB database, but only 0.25% of them have been functionally annotated. This manual process integrates knowledge from the protein families database Pfam, annotating family domains using sequence alignments and hidden Markov models. This approach has grown the Pfam annotations at a low rate in the last years. Recently, deep learning models appeared with the capability of learning evolutionary patterns from unaligned protein sequences. However, this requires large-scale data, while many families contain just a few sequences. Here, we contend this limitation can be overcome by transfer learning, exploiting the full potential of self-supervised learning on large unannotated data and then supervised learning on a small labeled dataset. We show results where errors in protein family prediction can be reduced by 55% with respect to standard methods. Elsevier 2023-02-10 /pmc/articles/PMC9982298/ /pubmed/36873903 http://dx.doi.org/10.1016/j.patter.2023.100691 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Opinion
Bugnon, Leandro A.
Fenoy, Emilio
Edera, Alejandro A.
Raad, Jonathan
Stegmayer, Georgina
Milone, Diego H.
Transfer learning: The key to functionally annotate the protein universe
title Transfer learning: The key to functionally annotate the protein universe
title_full Transfer learning: The key to functionally annotate the protein universe
title_fullStr Transfer learning: The key to functionally annotate the protein universe
title_full_unstemmed Transfer learning: The key to functionally annotate the protein universe
title_short Transfer learning: The key to functionally annotate the protein universe
title_sort transfer learning: the key to functionally annotate the protein universe
topic Opinion
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9982298/
https://www.ncbi.nlm.nih.gov/pubmed/36873903
http://dx.doi.org/10.1016/j.patter.2023.100691
work_keys_str_mv AT bugnonleandroa transferlearningthekeytofunctionallyannotatetheproteinuniverse
AT fenoyemilio transferlearningthekeytofunctionallyannotatetheproteinuniverse
AT ederaalejandroa transferlearningthekeytofunctionallyannotatetheproteinuniverse
AT raadjonathan transferlearningthekeytofunctionallyannotatetheproteinuniverse
AT stegmayergeorgina transferlearningthekeytofunctionallyannotatetheproteinuniverse
AT milonediegoh transferlearningthekeytofunctionallyannotatetheproteinuniverse