Cargando…

Protein transfer learning improves identification of heat shock protein families

Heat shock proteins (HSPs) play a pivotal role as molecular chaperones against unfavorable conditions. Although HSPs are of great importance, their computational identification remains a significant challenge. Previous studies have two major limitations. First, they relied heavily on amino acid comp...

Descripción completa

Detalles Bibliográficos
Autores principales: Min, Seonwoo, Kim, HyunGi, Lee, Byunghan, Yoon, Sungroh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8130922/
https://www.ncbi.nlm.nih.gov/pubmed/34003870
http://dx.doi.org/10.1371/journal.pone.0251865
_version_ 1783694609052860416
author Min, Seonwoo
Kim, HyunGi
Lee, Byunghan
Yoon, Sungroh
author_facet Min, Seonwoo
Kim, HyunGi
Lee, Byunghan
Yoon, Sungroh
author_sort Min, Seonwoo
collection PubMed
description Heat shock proteins (HSPs) play a pivotal role as molecular chaperones against unfavorable conditions. Although HSPs are of great importance, their computational identification remains a significant challenge. Previous studies have two major limitations. First, they relied heavily on amino acid composition features, which inevitably limited their prediction performance. Second, their prediction performance was overestimated because of the independent two-stage evaluations and train-test data redundancy. To overcome these limitations, we introduce two novel deep learning algorithms: (1) time-efficient DeepHSP and (2) high-performance DeeperHSP. We propose a convolutional neural network (CNN)-based DeepHSP that classifies both non-HSPs and six HSP families simultaneously. It outperforms state-of-the-art algorithms, despite taking 14–15 times less time for both training and inference. We further improve the performance of DeepHSP by taking advantage of protein transfer learning. While DeepHSP is trained on raw protein sequences, DeeperHSP is trained on top of pre-trained protein representations. Therefore, DeeperHSP remarkably outperforms state-of-the-art algorithms increasing F1 scores in both cross-validation and independent test experiments by 20% and 10%, respectively. We envision that the proposed algorithms can provide a proteome-wide prediction of HSPs and help in various downstream analyses for pathology and clinical research.
format Online
Article
Text
id pubmed-8130922
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-81309222021-05-27 Protein transfer learning improves identification of heat shock protein families Min, Seonwoo Kim, HyunGi Lee, Byunghan Yoon, Sungroh PLoS One Research Article Heat shock proteins (HSPs) play a pivotal role as molecular chaperones against unfavorable conditions. Although HSPs are of great importance, their computational identification remains a significant challenge. Previous studies have two major limitations. First, they relied heavily on amino acid composition features, which inevitably limited their prediction performance. Second, their prediction performance was overestimated because of the independent two-stage evaluations and train-test data redundancy. To overcome these limitations, we introduce two novel deep learning algorithms: (1) time-efficient DeepHSP and (2) high-performance DeeperHSP. We propose a convolutional neural network (CNN)-based DeepHSP that classifies both non-HSPs and six HSP families simultaneously. It outperforms state-of-the-art algorithms, despite taking 14–15 times less time for both training and inference. We further improve the performance of DeepHSP by taking advantage of protein transfer learning. While DeepHSP is trained on raw protein sequences, DeeperHSP is trained on top of pre-trained protein representations. Therefore, DeeperHSP remarkably outperforms state-of-the-art algorithms increasing F1 scores in both cross-validation and independent test experiments by 20% and 10%, respectively. We envision that the proposed algorithms can provide a proteome-wide prediction of HSPs and help in various downstream analyses for pathology and clinical research. Public Library of Science 2021-05-18 /pmc/articles/PMC8130922/ /pubmed/34003870 http://dx.doi.org/10.1371/journal.pone.0251865 Text en © 2021 Min et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Min, Seonwoo
Kim, HyunGi
Lee, Byunghan
Yoon, Sungroh
Protein transfer learning improves identification of heat shock protein families
title Protein transfer learning improves identification of heat shock protein families
title_full Protein transfer learning improves identification of heat shock protein families
title_fullStr Protein transfer learning improves identification of heat shock protein families
title_full_unstemmed Protein transfer learning improves identification of heat shock protein families
title_short Protein transfer learning improves identification of heat shock protein families
title_sort protein transfer learning improves identification of heat shock protein families
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8130922/
https://www.ncbi.nlm.nih.gov/pubmed/34003870
http://dx.doi.org/10.1371/journal.pone.0251865
work_keys_str_mv AT minseonwoo proteintransferlearningimprovesidentificationofheatshockproteinfamilies
AT kimhyungi proteintransferlearningimprovesidentificationofheatshockproteinfamilies
AT leebyunghan proteintransferlearningimprovesidentificationofheatshockproteinfamilies
AT yoonsungroh proteintransferlearningimprovesidentificationofheatshockproteinfamilies