Cargando…

Mathematical Basis of Predicting Dominant Function in Protein Sequences by a Generic HMM–ANN Algorithm

The accurate annotation of an unknown protein sequence depends on extant data of template sequences. This could be empirical or sets of reference sequences, and provides an exhaustive pool of probable functions. Individual methods of predicting dominant function possess shortcomings such as varying...

Descripción completa

Detalles Bibliográficos
Autor principal:	Kundu, Siddhartha
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Netherlands 2018
Materias:	Regular Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7250805/ https://www.ncbi.nlm.nih.gov/pubmed/29700659 http://dx.doi.org/10.1007/s10441-018-9327-x

_version_	1783538827667701760
author	Kundu, Siddhartha
author_facet	Kundu, Siddhartha
author_sort	Kundu, Siddhartha
collection	PubMed
description	The accurate annotation of an unknown protein sequence depends on extant data of template sequences. This could be empirical or sets of reference sequences, and provides an exhaustive pool of probable functions. Individual methods of predicting dominant function possess shortcomings such as varying degrees of inter-sequence redundancy, arbitrary domain inclusion thresholds, heterogeneous parameterization protocols, and ill-conditioned input channels. Here, I present a rigorous theoretical derivation of various steps of a generic algorithm that integrates and utilizes several statistical methods to predict the dominant function in unknown protein sequences. The accompanying mathematical proofs, interval definitions, analysis, and numerical computations presented are meant to offer insights not only into the specificity and accuracy of predictions, but also provide details of the operatic mechanisms involved in the integration and its ensuing rigor. The algorithm uses numerically modified raw hidden markov model scores of well defined sets of training sequences and clusters them on the basis of known function. The results are then fed into an artificial neural network, the predictions of which can be refined using the available data. This pipeline is trained recursively and can be used to discern the dominant principal function, and thereby, annotate an unknown protein sequence. Whilst, the approach is complex, the specificity of the final predictions can benefit laboratory workers design their experiments with greater confidence.
format	Online Article Text
id	pubmed-7250805
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Springer Netherlands
record_format	MEDLINE/PubMed
spelling	pubmed-72508052020-06-05 Mathematical Basis of Predicting Dominant Function in Protein Sequences by a Generic HMM–ANN Algorithm Kundu, Siddhartha Acta Biotheor Regular Article The accurate annotation of an unknown protein sequence depends on extant data of template sequences. This could be empirical or sets of reference sequences, and provides an exhaustive pool of probable functions. Individual methods of predicting dominant function possess shortcomings such as varying degrees of inter-sequence redundancy, arbitrary domain inclusion thresholds, heterogeneous parameterization protocols, and ill-conditioned input channels. Here, I present a rigorous theoretical derivation of various steps of a generic algorithm that integrates and utilizes several statistical methods to predict the dominant function in unknown protein sequences. The accompanying mathematical proofs, interval definitions, analysis, and numerical computations presented are meant to offer insights not only into the specificity and accuracy of predictions, but also provide details of the operatic mechanisms involved in the integration and its ensuing rigor. The algorithm uses numerically modified raw hidden markov model scores of well defined sets of training sequences and clusters them on the basis of known function. The results are then fed into an artificial neural network, the predictions of which can be refined using the available data. This pipeline is trained recursively and can be used to discern the dominant principal function, and thereby, annotate an unknown protein sequence. Whilst, the approach is complex, the specificity of the final predictions can benefit laboratory workers design their experiments with greater confidence. Springer Netherlands 2018-04-26 2018 /pmc/articles/PMC7250805/ /pubmed/29700659 http://dx.doi.org/10.1007/s10441-018-9327-x Text en © The Author(s) 2018, Corrected Publication 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Regular Article Kundu, Siddhartha Mathematical Basis of Predicting Dominant Function in Protein Sequences by a Generic HMM–ANN Algorithm
title	Mathematical Basis of Predicting Dominant Function in Protein Sequences by a Generic HMM–ANN Algorithm
title_full	Mathematical Basis of Predicting Dominant Function in Protein Sequences by a Generic HMM–ANN Algorithm
title_fullStr	Mathematical Basis of Predicting Dominant Function in Protein Sequences by a Generic HMM–ANN Algorithm
title_full_unstemmed	Mathematical Basis of Predicting Dominant Function in Protein Sequences by a Generic HMM–ANN Algorithm
title_short	Mathematical Basis of Predicting Dominant Function in Protein Sequences by a Generic HMM–ANN Algorithm
title_sort	mathematical basis of predicting dominant function in protein sequences by a generic hmm–ann algorithm
topic	Regular Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7250805/ https://www.ncbi.nlm.nih.gov/pubmed/29700659 http://dx.doi.org/10.1007/s10441-018-9327-x
work_keys_str_mv	AT kundusiddhartha mathematicalbasisofpredictingdominantfunctioninproteinsequencesbyagenerichmmannalgorithm

Mathematical Basis of Predicting Dominant Function in Protein Sequences by a Generic HMM–ANN Algorithm

Ejemplares similares