Cargando…

DeepLoc 2.0: multi-label subcellular localization prediction using protein language models

The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and...

Descripción completa

Detalles Bibliográficos
Autores principales: Thumuluri, Vineet, Almagro Armenteros, José Juan, Johansen, Alexander Rosenberg, Nielsen, Henrik, Winther, Ole
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9252801/
https://www.ncbi.nlm.nih.gov/pubmed/35489069
http://dx.doi.org/10.1093/nar/gkac278
_version_ 1784740352093061120
author Thumuluri, Vineet
Almagro Armenteros, José Juan
Johansen, Alexander Rosenberg
Nielsen, Henrik
Winther, Ole
author_facet Thumuluri, Vineet
Almagro Armenteros, José Juan
Johansen, Alexander Rosenberg
Nielsen, Henrik
Winther, Ole
author_sort Thumuluri, Vineet
collection PubMed
description The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.
format Online
Article
Text
id pubmed-9252801
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92528012022-07-05 DeepLoc 2.0: multi-label subcellular localization prediction using protein language models Thumuluri, Vineet Almagro Armenteros, José Juan Johansen, Alexander Rosenberg Nielsen, Henrik Winther, Ole Nucleic Acids Res Web Server Issue The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0. Oxford University Press 2022-04-30 /pmc/articles/PMC9252801/ /pubmed/35489069 http://dx.doi.org/10.1093/nar/gkac278 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Web Server Issue
Thumuluri, Vineet
Almagro Armenteros, José Juan
Johansen, Alexander Rosenberg
Nielsen, Henrik
Winther, Ole
DeepLoc 2.0: multi-label subcellular localization prediction using protein language models
title DeepLoc 2.0: multi-label subcellular localization prediction using protein language models
title_full DeepLoc 2.0: multi-label subcellular localization prediction using protein language models
title_fullStr DeepLoc 2.0: multi-label subcellular localization prediction using protein language models
title_full_unstemmed DeepLoc 2.0: multi-label subcellular localization prediction using protein language models
title_short DeepLoc 2.0: multi-label subcellular localization prediction using protein language models
title_sort deeploc 2.0: multi-label subcellular localization prediction using protein language models
topic Web Server Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9252801/
https://www.ncbi.nlm.nih.gov/pubmed/35489069
http://dx.doi.org/10.1093/nar/gkac278
work_keys_str_mv AT thumulurivineet deeploc20multilabelsubcellularlocalizationpredictionusingproteinlanguagemodels
AT almagroarmenterosjosejuan deeploc20multilabelsubcellularlocalizationpredictionusingproteinlanguagemodels
AT johansenalexanderrosenberg deeploc20multilabelsubcellularlocalizationpredictionusingproteinlanguagemodels
AT nielsenhenrik deeploc20multilabelsubcellularlocalizationpredictionusingproteinlanguagemodels
AT wintherole deeploc20multilabelsubcellularlocalizationpredictionusingproteinlanguagemodels