Cargando…
Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction
The use of raw amino acid sequences as input for deep learning models for protein functional prediction has gained popularity in recent years. This scheme obliges to manage proteins with different lengths, while deep learning models require same-shape input. To accomplish this, zeros are usually add...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7471694/ https://www.ncbi.nlm.nih.gov/pubmed/32884053 http://dx.doi.org/10.1038/s41598-020-71450-8 |
_version_ | 1783578822547865600 |
---|---|
author | Lopez-del Rio, Angela Martin, Maria Perera-Lluna, Alexandre Saidi, Rabie |
author_facet | Lopez-del Rio, Angela Martin, Maria Perera-Lluna, Alexandre Saidi, Rabie |
author_sort | Lopez-del Rio, Angela |
collection | PubMed |
description | The use of raw amino acid sequences as input for deep learning models for protein functional prediction has gained popularity in recent years. This scheme obliges to manage proteins with different lengths, while deep learning models require same-shape input. To accomplish this, zeros are usually added to each sequence up to a established common length in a process called zero-padding. However, the effect of different padding strategies on model performance and data structure is yet unknown. We propose and implement four novel types of padding the amino acid sequences. Then, we analysed the impact of different ways of padding the amino acid sequences in a hierarchical Enzyme Commission number prediction problem. Results show that padding has an effect on model performance even when there are convolutional layers implied. Contrastingly to most of deep learning works which focus mainly on architectures, this study highlights the relevance of the deemed-of-low-importance process of padding and raises awareness of the need to refine it for better performance. The code of this analysis is publicly available at https://github.com/b2slab/padding_benchmark. |
format | Online Article Text |
id | pubmed-7471694 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-74716942020-09-04 Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction Lopez-del Rio, Angela Martin, Maria Perera-Lluna, Alexandre Saidi, Rabie Sci Rep Article The use of raw amino acid sequences as input for deep learning models for protein functional prediction has gained popularity in recent years. This scheme obliges to manage proteins with different lengths, while deep learning models require same-shape input. To accomplish this, zeros are usually added to each sequence up to a established common length in a process called zero-padding. However, the effect of different padding strategies on model performance and data structure is yet unknown. We propose and implement four novel types of padding the amino acid sequences. Then, we analysed the impact of different ways of padding the amino acid sequences in a hierarchical Enzyme Commission number prediction problem. Results show that padding has an effect on model performance even when there are convolutional layers implied. Contrastingly to most of deep learning works which focus mainly on architectures, this study highlights the relevance of the deemed-of-low-importance process of padding and raises awareness of the need to refine it for better performance. The code of this analysis is publicly available at https://github.com/b2slab/padding_benchmark. Nature Publishing Group UK 2020-09-03 /pmc/articles/PMC7471694/ /pubmed/32884053 http://dx.doi.org/10.1038/s41598-020-71450-8 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Lopez-del Rio, Angela Martin, Maria Perera-Lluna, Alexandre Saidi, Rabie Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction |
title | Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction |
title_full | Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction |
title_fullStr | Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction |
title_full_unstemmed | Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction |
title_short | Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction |
title_sort | effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7471694/ https://www.ncbi.nlm.nih.gov/pubmed/32884053 http://dx.doi.org/10.1038/s41598-020-71450-8 |
work_keys_str_mv | AT lopezdelrioangela effectofsequencepaddingontheperformanceofdeeplearningmodelsinarchaealproteinfunctionalprediction AT martinmaria effectofsequencepaddingontheperformanceofdeeplearningmodelsinarchaealproteinfunctionalprediction AT pererallunaalexandre effectofsequencepaddingontheperformanceofdeeplearningmodelsinarchaealproteinfunctionalprediction AT saidirabie effectofsequencepaddingontheperformanceofdeeplearningmodelsinarchaealproteinfunctionalprediction |