Cargando…

In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins

Peroxisomes are ubiquitous membrane-bound organelles, and aberrant localisation of peroxisomal proteins contributes to the pathogenesis of several disorders. Many computational methods focus on assigning protein sequences to subcellular compartments, but there are no specific tools tailored for the...

Descripción completa

Detalles Bibliográficos
Autores principales: Anteghini, Marco, Martins dos Santos, Vitor, Saccenti, Edoardo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8232616/
https://www.ncbi.nlm.nih.gov/pubmed/34203866
http://dx.doi.org/10.3390/ijms22126409
_version_ 1783713673071558656
author Anteghini, Marco
Martins dos Santos, Vitor
Saccenti, Edoardo
author_facet Anteghini, Marco
Martins dos Santos, Vitor
Saccenti, Edoardo
author_sort Anteghini, Marco
collection PubMed
description Peroxisomes are ubiquitous membrane-bound organelles, and aberrant localisation of peroxisomal proteins contributes to the pathogenesis of several disorders. Many computational methods focus on assigning protein sequences to subcellular compartments, but there are no specific tools tailored for the sub-localisation (matrix vs. membrane) of peroxisome proteins. We present here In-Pero, a new method for predicting protein sub-peroxisomal cellular localisation. In-Pero combines standard machine learning approaches with recently proposed multi-dimensional deep-learning representations of the protein amino-acid sequence. It showed a classification accuracy above 0.9 in predicting peroxisomal matrix and membrane proteins. The method is trained and tested using a double cross-validation approach on a curated data set comprising 160 peroxisomal proteins with experimental evidence for sub-peroxisomal localisation. We further show that the proposed approach can be easily adapted (In-Mito) to the prediction of mitochondrial protein localisation obtaining performances for certain classes of proteins (matrix and inner-membrane) superior to existing tools.
format Online
Article
Text
id pubmed-8232616
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-82326162021-06-26 In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins Anteghini, Marco Martins dos Santos, Vitor Saccenti, Edoardo Int J Mol Sci Article Peroxisomes are ubiquitous membrane-bound organelles, and aberrant localisation of peroxisomal proteins contributes to the pathogenesis of several disorders. Many computational methods focus on assigning protein sequences to subcellular compartments, but there are no specific tools tailored for the sub-localisation (matrix vs. membrane) of peroxisome proteins. We present here In-Pero, a new method for predicting protein sub-peroxisomal cellular localisation. In-Pero combines standard machine learning approaches with recently proposed multi-dimensional deep-learning representations of the protein amino-acid sequence. It showed a classification accuracy above 0.9 in predicting peroxisomal matrix and membrane proteins. The method is trained and tested using a double cross-validation approach on a curated data set comprising 160 peroxisomal proteins with experimental evidence for sub-peroxisomal localisation. We further show that the proposed approach can be easily adapted (In-Mito) to the prediction of mitochondrial protein localisation obtaining performances for certain classes of proteins (matrix and inner-membrane) superior to existing tools. MDPI 2021-06-15 /pmc/articles/PMC8232616/ /pubmed/34203866 http://dx.doi.org/10.3390/ijms22126409 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Anteghini, Marco
Martins dos Santos, Vitor
Saccenti, Edoardo
In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins
title In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins
title_full In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins
title_fullStr In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins
title_full_unstemmed In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins
title_short In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins
title_sort in-pero: exploiting deep learning embeddings of protein sequences to predict the localisation of peroxisomal proteins
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8232616/
https://www.ncbi.nlm.nih.gov/pubmed/34203866
http://dx.doi.org/10.3390/ijms22126409
work_keys_str_mv AT anteghinimarco inperoexploitingdeeplearningembeddingsofproteinsequencestopredictthelocalisationofperoxisomalproteins
AT martinsdossantosvitor inperoexploitingdeeplearningembeddingsofproteinsequencestopredictthelocalisationofperoxisomalproteins
AT saccentiedoardo inperoexploitingdeeplearningembeddingsofproteinsequencestopredictthelocalisationofperoxisomalproteins