Cargando…

Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction

The use of raw amino acid sequences as input for deep learning models for protein functional prediction has gained popularity in recent years. This scheme obliges to manage proteins with different lengths, while deep learning models require same-shape input. To accomplish this, zeros are usually add...

Descripción completa

Detalles Bibliográficos
Autores principales: Lopez-del Rio, Angela, Martin, Maria, Perera-Lluna, Alexandre, Saidi, Rabie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7471694/
https://www.ncbi.nlm.nih.gov/pubmed/32884053
http://dx.doi.org/10.1038/s41598-020-71450-8
_version_ 1783578822547865600
author Lopez-del Rio, Angela
Martin, Maria
Perera-Lluna, Alexandre
Saidi, Rabie
author_facet Lopez-del Rio, Angela
Martin, Maria
Perera-Lluna, Alexandre
Saidi, Rabie
author_sort Lopez-del Rio, Angela
collection PubMed
description The use of raw amino acid sequences as input for deep learning models for protein functional prediction has gained popularity in recent years. This scheme obliges to manage proteins with different lengths, while deep learning models require same-shape input. To accomplish this, zeros are usually added to each sequence up to a established common length in a process called zero-padding. However, the effect of different padding strategies on model performance and data structure is yet unknown. We propose and implement four novel types of padding the amino acid sequences. Then, we analysed the impact of different ways of padding the amino acid sequences in a hierarchical Enzyme Commission number prediction problem. Results show that padding has an effect on model performance even when there are convolutional layers implied. Contrastingly to most of deep learning works which focus mainly on architectures, this study highlights the relevance of the deemed-of-low-importance process of padding and raises awareness of the need to refine it for better performance. The code of this analysis is publicly available at https://github.com/b2slab/padding_benchmark.
format Online
Article
Text
id pubmed-7471694
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-74716942020-09-04 Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction Lopez-del Rio, Angela Martin, Maria Perera-Lluna, Alexandre Saidi, Rabie Sci Rep Article The use of raw amino acid sequences as input for deep learning models for protein functional prediction has gained popularity in recent years. This scheme obliges to manage proteins with different lengths, while deep learning models require same-shape input. To accomplish this, zeros are usually added to each sequence up to a established common length in a process called zero-padding. However, the effect of different padding strategies on model performance and data structure is yet unknown. We propose and implement four novel types of padding the amino acid sequences. Then, we analysed the impact of different ways of padding the amino acid sequences in a hierarchical Enzyme Commission number prediction problem. Results show that padding has an effect on model performance even when there are convolutional layers implied. Contrastingly to most of deep learning works which focus mainly on architectures, this study highlights the relevance of the deemed-of-low-importance process of padding and raises awareness of the need to refine it for better performance. The code of this analysis is publicly available at https://github.com/b2slab/padding_benchmark. Nature Publishing Group UK 2020-09-03 /pmc/articles/PMC7471694/ /pubmed/32884053 http://dx.doi.org/10.1038/s41598-020-71450-8 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Lopez-del Rio, Angela
Martin, Maria
Perera-Lluna, Alexandre
Saidi, Rabie
Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction
title Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction
title_full Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction
title_fullStr Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction
title_full_unstemmed Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction
title_short Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction
title_sort effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7471694/
https://www.ncbi.nlm.nih.gov/pubmed/32884053
http://dx.doi.org/10.1038/s41598-020-71450-8
work_keys_str_mv AT lopezdelrioangela effectofsequencepaddingontheperformanceofdeeplearningmodelsinarchaealproteinfunctionalprediction
AT martinmaria effectofsequencepaddingontheperformanceofdeeplearningmodelsinarchaealproteinfunctionalprediction
AT pererallunaalexandre effectofsequencepaddingontheperformanceofdeeplearningmodelsinarchaealproteinfunctionalprediction
AT saidirabie effectofsequencepaddingontheperformanceofdeeplearningmodelsinarchaealproteinfunctionalprediction