A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy

The secondary structure prediction of proteins is a classic topic of computational structural biology with a variety of applications. During the past decade, the accuracy of prediction achieved by state-of-the-art algorithms has been >80%; meanwhile, the time cost of prediction increased rapidly...

Descripción completa

Detalles Bibliográficos
Autores principales:	Juan, Sheng-Hung, Chen, Teng-Ruei, Lo, Wei-Cheng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7326220/ https://www.ncbi.nlm.nih.gov/pubmed/32603341 http://dx.doi.org/10.1371/journal.pone.0235153

_version_	1783552306337284096
author	Juan, Sheng-Hung Chen, Teng-Ruei Lo, Wei-Cheng
author_facet	Juan, Sheng-Hung Chen, Teng-Ruei Lo, Wei-Cheng
author_sort	Juan, Sheng-Hung
collection	PubMed
description	The secondary structure prediction of proteins is a classic topic of computational structural biology with a variety of applications. During the past decade, the accuracy of prediction achieved by state-of-the-art algorithms has been >80%; meanwhile, the time cost of prediction increased rapidly because of the exponential growth of fundamental protein sequence data. Based on literature studies and preliminary observations on the relationships between the size/homology of the fundamental protein dataset and the speed/accuracy of predictions, we raised two hypotheses that might be helpful to determine the main influence factors of the efficiency of secondary structure prediction. Experimental results of size and homology reductions of the fundamental protein dataset supported those hypotheses. They revealed that shrinking the size of the dataset could substantially cut down the time cost of prediction with a slight decrease of accuracy, which could be increased on the contrary by homology reduction of the dataset. Moreover, the Shannon information entropy could be applied to explain how accuracy was influenced by the size and homology of the dataset. Based on these findings, we proposed that a proper combination of size and homology reductions of the protein dataset could speed up the secondary structure prediction while preserving the high accuracy of state-of-the-art algorithms. Testing the proposed strategy with the fundamental protein dataset of the year 2018 provided by the Universal Protein Resource, the speed of prediction was enhanced over 20 folds while all accuracy measures remained equivalently high. These findings are supposed helpful for improving the efficiency of researches and applications depending on the secondary structure prediction of proteins. To make future implementations of the proposed strategy easy, we have established a database of size and homology reduced protein datasets at http://10.life.nctu.edu.tw/UniRefNR.
format	Online Article Text
id	pubmed-7326220
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-73262202020-07-10 A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy Juan, Sheng-Hung Chen, Teng-Ruei Lo, Wei-Cheng PLoS One Research Article The secondary structure prediction of proteins is a classic topic of computational structural biology with a variety of applications. During the past decade, the accuracy of prediction achieved by state-of-the-art algorithms has been >80%; meanwhile, the time cost of prediction increased rapidly because of the exponential growth of fundamental protein sequence data. Based on literature studies and preliminary observations on the relationships between the size/homology of the fundamental protein dataset and the speed/accuracy of predictions, we raised two hypotheses that might be helpful to determine the main influence factors of the efficiency of secondary structure prediction. Experimental results of size and homology reductions of the fundamental protein dataset supported those hypotheses. They revealed that shrinking the size of the dataset could substantially cut down the time cost of prediction with a slight decrease of accuracy, which could be increased on the contrary by homology reduction of the dataset. Moreover, the Shannon information entropy could be applied to explain how accuracy was influenced by the size and homology of the dataset. Based on these findings, we proposed that a proper combination of size and homology reductions of the protein dataset could speed up the secondary structure prediction while preserving the high accuracy of state-of-the-art algorithms. Testing the proposed strategy with the fundamental protein dataset of the year 2018 provided by the Universal Protein Resource, the speed of prediction was enhanced over 20 folds while all accuracy measures remained equivalently high. These findings are supposed helpful for improving the efficiency of researches and applications depending on the secondary structure prediction of proteins. To make future implementations of the proposed strategy easy, we have established a database of size and homology reduced protein datasets at http://10.life.nctu.edu.tw/UniRefNR. Public Library of Science 2020-06-30 /pmc/articles/PMC7326220/ /pubmed/32603341 http://dx.doi.org/10.1371/journal.pone.0235153 Text en © 2020 Juan et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Juan, Sheng-Hung Chen, Teng-Ruei Lo, Wei-Cheng A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy
title	A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy
title_full	A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy
title_fullStr	A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy
title_full_unstemmed	A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy
title_short	A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy
title_sort	simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7326220/ https://www.ncbi.nlm.nih.gov/pubmed/32603341 http://dx.doi.org/10.1371/journal.pone.0235153
work_keys_str_mv	AT juanshenghung asimplestrategytoenhancethespeedofproteinsecondarystructurepredictionwithoutsacrificingaccuracy AT chentengruei asimplestrategytoenhancethespeedofproteinsecondarystructurepredictionwithoutsacrificingaccuracy AT loweicheng asimplestrategytoenhancethespeedofproteinsecondarystructurepredictionwithoutsacrificingaccuracy AT juanshenghung simplestrategytoenhancethespeedofproteinsecondarystructurepredictionwithoutsacrificingaccuracy AT chentengruei simplestrategytoenhancethespeedofproteinsecondarystructurepredictionwithoutsacrificingaccuracy AT loweicheng simplestrategytoenhancethespeedofproteinsecondarystructurepredictionwithoutsacrificingaccuracy

A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy

Ejemplares similares