Cargando…
DeepLoc 2.0: multi-label subcellular localization prediction using protein language models
The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9252801/ https://www.ncbi.nlm.nih.gov/pubmed/35489069 http://dx.doi.org/10.1093/nar/gkac278 |
_version_ | 1784740352093061120 |
---|---|
author | Thumuluri, Vineet Almagro Armenteros, José Juan Johansen, Alexander Rosenberg Nielsen, Henrik Winther, Ole |
author_facet | Thumuluri, Vineet Almagro Armenteros, José Juan Johansen, Alexander Rosenberg Nielsen, Henrik Winther, Ole |
author_sort | Thumuluri, Vineet |
collection | PubMed |
description | The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0. |
format | Online Article Text |
id | pubmed-9252801 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-92528012022-07-05 DeepLoc 2.0: multi-label subcellular localization prediction using protein language models Thumuluri, Vineet Almagro Armenteros, José Juan Johansen, Alexander Rosenberg Nielsen, Henrik Winther, Ole Nucleic Acids Res Web Server Issue The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0. Oxford University Press 2022-04-30 /pmc/articles/PMC9252801/ /pubmed/35489069 http://dx.doi.org/10.1093/nar/gkac278 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Web Server Issue Thumuluri, Vineet Almagro Armenteros, José Juan Johansen, Alexander Rosenberg Nielsen, Henrik Winther, Ole DeepLoc 2.0: multi-label subcellular localization prediction using protein language models |
title | DeepLoc 2.0: multi-label subcellular localization prediction using protein language models |
title_full | DeepLoc 2.0: multi-label subcellular localization prediction using protein language models |
title_fullStr | DeepLoc 2.0: multi-label subcellular localization prediction using protein language models |
title_full_unstemmed | DeepLoc 2.0: multi-label subcellular localization prediction using protein language models |
title_short | DeepLoc 2.0: multi-label subcellular localization prediction using protein language models |
title_sort | deeploc 2.0: multi-label subcellular localization prediction using protein language models |
topic | Web Server Issue |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9252801/ https://www.ncbi.nlm.nih.gov/pubmed/35489069 http://dx.doi.org/10.1093/nar/gkac278 |
work_keys_str_mv | AT thumulurivineet deeploc20multilabelsubcellularlocalizationpredictionusingproteinlanguagemodels AT almagroarmenterosjosejuan deeploc20multilabelsubcellularlocalizationpredictionusingproteinlanguagemodels AT johansenalexanderrosenberg deeploc20multilabelsubcellularlocalizationpredictionusingproteinlanguagemodels AT nielsenhenrik deeploc20multilabelsubcellularlocalizationpredictionusingproteinlanguagemodels AT wintherole deeploc20multilabelsubcellularlocalizationpredictionusingproteinlanguagemodels |