Cargando…

Detecting sequence signals in targeting peptides using deep learning

In bioinformatics, machine learning methods have been used to predict features embedded in the sequences. In contrast to what is generally assumed, machine learning approaches can also provide new insights into the underlying biology. Here, we demonstrate this by presenting TargetP 2.0, a novel stat...

Descripción completa

Detalles Bibliográficos
Autores principales: Almagro Armenteros, Jose Juan, Salvatore, Marco, Emanuelsson, Olof, Winther, Ole, von Heijne, Gunnar, Elofsson, Arne, Nielsen, Henrik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Life Science Alliance LLC 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6769257/
https://www.ncbi.nlm.nih.gov/pubmed/31570514
http://dx.doi.org/10.26508/lsa.201900429
_version_ 1783455209961291776
author Almagro Armenteros, Jose Juan
Salvatore, Marco
Emanuelsson, Olof
Winther, Ole
von Heijne, Gunnar
Elofsson, Arne
Nielsen, Henrik
author_facet Almagro Armenteros, Jose Juan
Salvatore, Marco
Emanuelsson, Olof
Winther, Ole
von Heijne, Gunnar
Elofsson, Arne
Nielsen, Henrik
author_sort Almagro Armenteros, Jose Juan
collection PubMed
description In bioinformatics, machine learning methods have been used to predict features embedded in the sequences. In contrast to what is generally assumed, machine learning approaches can also provide new insights into the underlying biology. Here, we demonstrate this by presenting TargetP 2.0, a novel state-of-the-art method to identify N-terminal sorting signals, which direct proteins to the secretory pathway, mitochondria, and chloroplasts or other plastids. By examining the strongest signals from the attention layer in the network, we find that the second residue in the protein, that is, the one following the initial methionine, has a strong influence on the classification. We observe that two-thirds of chloroplast and thylakoid transit peptides have an alanine in position 2, compared with 20% in other plant proteins. We also note that in fungi and single-celled eukaryotes, less than 30% of the targeting peptides have an amino acid that allows the removal of the N-terminal methionine compared with 60% for the proteins without targeting peptide. The importance of this feature for predictions has not been highlighted before.
format Online
Article
Text
id pubmed-6769257
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Life Science Alliance LLC
record_format MEDLINE/PubMed
spelling pubmed-67692572019-10-02 Detecting sequence signals in targeting peptides using deep learning Almagro Armenteros, Jose Juan Salvatore, Marco Emanuelsson, Olof Winther, Ole von Heijne, Gunnar Elofsson, Arne Nielsen, Henrik Life Sci Alliance Methods In bioinformatics, machine learning methods have been used to predict features embedded in the sequences. In contrast to what is generally assumed, machine learning approaches can also provide new insights into the underlying biology. Here, we demonstrate this by presenting TargetP 2.0, a novel state-of-the-art method to identify N-terminal sorting signals, which direct proteins to the secretory pathway, mitochondria, and chloroplasts or other plastids. By examining the strongest signals from the attention layer in the network, we find that the second residue in the protein, that is, the one following the initial methionine, has a strong influence on the classification. We observe that two-thirds of chloroplast and thylakoid transit peptides have an alanine in position 2, compared with 20% in other plant proteins. We also note that in fungi and single-celled eukaryotes, less than 30% of the targeting peptides have an amino acid that allows the removal of the N-terminal methionine compared with 60% for the proteins without targeting peptide. The importance of this feature for predictions has not been highlighted before. Life Science Alliance LLC 2019-09-30 /pmc/articles/PMC6769257/ /pubmed/31570514 http://dx.doi.org/10.26508/lsa.201900429 Text en © 2019 Armenteros et al. https://creativecommons.org/licenses/by/4.0/This article is available under a Creative Commons License (Attribution 4.0 International, as described at https://creativecommons.org/licenses/by/4.0/).
spellingShingle Methods
Almagro Armenteros, Jose Juan
Salvatore, Marco
Emanuelsson, Olof
Winther, Ole
von Heijne, Gunnar
Elofsson, Arne
Nielsen, Henrik
Detecting sequence signals in targeting peptides using deep learning
title Detecting sequence signals in targeting peptides using deep learning
title_full Detecting sequence signals in targeting peptides using deep learning
title_fullStr Detecting sequence signals in targeting peptides using deep learning
title_full_unstemmed Detecting sequence signals in targeting peptides using deep learning
title_short Detecting sequence signals in targeting peptides using deep learning
title_sort detecting sequence signals in targeting peptides using deep learning
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6769257/
https://www.ncbi.nlm.nih.gov/pubmed/31570514
http://dx.doi.org/10.26508/lsa.201900429
work_keys_str_mv AT almagroarmenterosjosejuan detectingsequencesignalsintargetingpeptidesusingdeeplearning
AT salvatoremarco detectingsequencesignalsintargetingpeptidesusingdeeplearning
AT emanuelssonolof detectingsequencesignalsintargetingpeptidesusingdeeplearning
AT wintherole detectingsequencesignalsintargetingpeptidesusingdeeplearning
AT vonheijnegunnar detectingsequencesignalsintargetingpeptidesusingdeeplearning
AT elofssonarne detectingsequencesignalsintargetingpeptidesusingdeeplearning
AT nielsenhenrik detectingsequencesignalsintargetingpeptidesusingdeeplearning