Cargando…

Quantitative Analysis of Protein Evolution: The Phylogeny of Osteopontin

The phylogenetic analysis of proteins conventionally relies on the evaluation of amino acid sequences or coding sequences. Individual amino acids have measurable features that allow the translation from strings of letters (amino acids or bases) into strings of numbers (physico-chemical properties)....

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Xia, Weber, Georg F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8415472/
https://www.ncbi.nlm.nih.gov/pubmed/34484297
http://dx.doi.org/10.3389/fgene.2021.700789
Descripción
Sumario:The phylogenetic analysis of proteins conventionally relies on the evaluation of amino acid sequences or coding sequences. Individual amino acids have measurable features that allow the translation from strings of letters (amino acids or bases) into strings of numbers (physico-chemical properties). When the letters are converted to measurable properties, such numerical strings can be evaluated quantitatively with various tools of complex systems research. We build on our prior phylogenetic analysis of the cytokine Osteopontin to validate the quantitative approach toward the study of protein evolution. Phylogenetic trees constructed from the number strings differentiate among all sequences. In pairwise comparisons, autocorrelation, average mutual information and box counting dimension yield one number each for the overall relatedness between sequences. We also find that bivariate wavelet analysis distinguishes hypermutable regions from conserved regions of the protein. The investigation of protein evolution via quantitative study of the physico-chemical characteristics pertaining to the amino acid building blocks broadens the spectrum of applicable research tools, accounts for mutation as well as selection, gives assess to multiple vistas depending on the property evaluated, discriminates more accurately among sequences, and renders the analysis more quantitative than utilizing strings of letters as starting points.