Cargando…

Deep learning program to predict protein functions based on sequence information

Deep learning technologies have been adopted to predict the functions of newly identified proteins in silico. However, most current models are not suitable for poorly characterized proteins because they require diverse information on target proteins. We designed a binary classification deep learning...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ko, Chang Woo, Huh, June, Park, Jong-Wan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2022
Materias:	Method Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8790617/ https://www.ncbi.nlm.nih.gov/pubmed/35111575 http://dx.doi.org/10.1016/j.mex.2022.101622

_version_	1784640055883595776
author	Ko, Chang Woo Huh, June Park, Jong-Wan
author_facet	Ko, Chang Woo Huh, June Park, Jong-Wan
author_sort	Ko, Chang Woo
collection	PubMed
description	Deep learning technologies have been adopted to predict the functions of newly identified proteins in silico. However, most current models are not suitable for poorly characterized proteins because they require diverse information on target proteins. We designed a binary classification deep learning program requiring only sequence information. This program was named ‘FUTUSA’ (function teller using sequence alone). It applied sequence segmentation during the sequence feature extraction process, by a convolution neural network, to train the regional sequence patterns and their relationship. This segmentation process improved the predictive performance by 49% than the full-length process. Compared with a baseline method, our approach achieved higher performance in predicting oxidoreductase activity. In addition, FUTUSA also showed dramatic performance in predicting acetyltransferase and demethylase activities. Next, we tested the possibility that FUTUSA can predict the functional consequence of point mutation. After trained for monooxygenase activity, FUTUSA successfully predicted the impact of point mutations on phenylalanine hydroxylase, which is responsible for an inherited metabolic disease PKU. This deep-learning program can be used as the first-step tool for characterizing newly identified or poorly studied proteins. • We proposed new deep learning program to predict protein functions in silico that requires nothing more than the protein sequence information. • Due to application of sequence segmentation, the efficiency of prediction is improved. • This method makes prediction of the clinical impact of mutations or polymorphisms possible.
format	Online Article Text
id	pubmed-8790617
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-87906172022-02-01 Deep learning program to predict protein functions based on sequence information Ko, Chang Woo Huh, June Park, Jong-Wan MethodsX Method Article Deep learning technologies have been adopted to predict the functions of newly identified proteins in silico. However, most current models are not suitable for poorly characterized proteins because they require diverse information on target proteins. We designed a binary classification deep learning program requiring only sequence information. This program was named ‘FUTUSA’ (function teller using sequence alone). It applied sequence segmentation during the sequence feature extraction process, by a convolution neural network, to train the regional sequence patterns and their relationship. This segmentation process improved the predictive performance by 49% than the full-length process. Compared with a baseline method, our approach achieved higher performance in predicting oxidoreductase activity. In addition, FUTUSA also showed dramatic performance in predicting acetyltransferase and demethylase activities. Next, we tested the possibility that FUTUSA can predict the functional consequence of point mutation. After trained for monooxygenase activity, FUTUSA successfully predicted the impact of point mutations on phenylalanine hydroxylase, which is responsible for an inherited metabolic disease PKU. This deep-learning program can be used as the first-step tool for characterizing newly identified or poorly studied proteins. • We proposed new deep learning program to predict protein functions in silico that requires nothing more than the protein sequence information. • Due to application of sequence segmentation, the efficiency of prediction is improved. • This method makes prediction of the clinical impact of mutations or polymorphisms possible. Elsevier 2022-01-15 /pmc/articles/PMC8790617/ /pubmed/35111575 http://dx.doi.org/10.1016/j.mex.2022.101622 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Method Article Ko, Chang Woo Huh, June Park, Jong-Wan Deep learning program to predict protein functions based on sequence information
title	Deep learning program to predict protein functions based on sequence information
title_full	Deep learning program to predict protein functions based on sequence information
title_fullStr	Deep learning program to predict protein functions based on sequence information
title_full_unstemmed	Deep learning program to predict protein functions based on sequence information
title_short	Deep learning program to predict protein functions based on sequence information
title_sort	deep learning program to predict protein functions based on sequence information
topic	Method Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8790617/ https://www.ncbi.nlm.nih.gov/pubmed/35111575 http://dx.doi.org/10.1016/j.mex.2022.101622
work_keys_str_mv	AT kochangwoo deeplearningprogramtopredictproteinfunctionsbasedonsequenceinformation AT huhjune deeplearningprogramtopredictproteinfunctionsbasedonsequenceinformation AT parkjongwan deeplearningprogramtopredictproteinfunctionsbasedonsequenceinformation

Deep learning program to predict protein functions based on sequence information

Ejemplares similares