Cargando…

Mathematical Approach to Protein Sequence Comparison Based on Physiochemical Properties

[Image: see text] The difficult aspect of developing new protein sequence comparison techniques is coming up with a method that can quickly and effectively handle huge data sets of various lengths in a timely manner. In this work, we first obtain two numerical representations of protein sequences se...

Descripción completa

Detalles Bibliográficos
Autores principales: Pal, Jayanta, Ghosh, Soumen, Maji, Bansibadan, Bhattacharya, Dilip Kumar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9631895/
https://www.ncbi.nlm.nih.gov/pubmed/36340165
http://dx.doi.org/10.1021/acsomega.2c06103
_version_ 1784823912590213120
author Pal, Jayanta
Ghosh, Soumen
Maji, Bansibadan
Bhattacharya, Dilip Kumar
author_facet Pal, Jayanta
Ghosh, Soumen
Maji, Bansibadan
Bhattacharya, Dilip Kumar
author_sort Pal, Jayanta
collection PubMed
description [Image: see text] The difficult aspect of developing new protein sequence comparison techniques is coming up with a method that can quickly and effectively handle huge data sets of various lengths in a timely manner. In this work, we first obtain two numerical representations of protein sequences separately based on one physical property and one chemical property of amino acids. The lengths of all the sequences under comparison are made equal by appending the required number of zeroes. Then, fast Fourier transform is applied to this numerical time series to obtain the corresponding spectrum. Next, the spectrum values are reduced by the standard inter coefficient difference method. Finally, the corresponding normalized values of the reduced spectrum are selected as the descriptors for protein sequence comparison. Using these descriptors, the distance matrices are obtained using Euclidian distance. They are subsequently used to draw the phylogenetic trees using the UPGMA algorithm. Phylogenetic trees are first constructed for 9 ND4, 9 ND5, and 9 ND6 proteins using the polarity value as the chemical property and the molecular weight as the physical property. They are compared, and it is seen that polarity is a better choice than molecular weight in protein sequence comparison. Next, using the polarity property, phylogenetic trees are obtained for 12 baculovirus and 24 transferrin proteins. The results are compared with those obtained earlier on the identical sequences by other methods. Three assessment criteria are considered for comparison of the results—quality based on rationalized perception, quantitative measures based on symmetric distance, and computational speed. In all the cases, the results are found to be more satisfactory.
format Online
Article
Text
id pubmed-9631895
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-96318952022-11-04 Mathematical Approach to Protein Sequence Comparison Based on Physiochemical Properties Pal, Jayanta Ghosh, Soumen Maji, Bansibadan Bhattacharya, Dilip Kumar ACS Omega [Image: see text] The difficult aspect of developing new protein sequence comparison techniques is coming up with a method that can quickly and effectively handle huge data sets of various lengths in a timely manner. In this work, we first obtain two numerical representations of protein sequences separately based on one physical property and one chemical property of amino acids. The lengths of all the sequences under comparison are made equal by appending the required number of zeroes. Then, fast Fourier transform is applied to this numerical time series to obtain the corresponding spectrum. Next, the spectrum values are reduced by the standard inter coefficient difference method. Finally, the corresponding normalized values of the reduced spectrum are selected as the descriptors for protein sequence comparison. Using these descriptors, the distance matrices are obtained using Euclidian distance. They are subsequently used to draw the phylogenetic trees using the UPGMA algorithm. Phylogenetic trees are first constructed for 9 ND4, 9 ND5, and 9 ND6 proteins using the polarity value as the chemical property and the molecular weight as the physical property. They are compared, and it is seen that polarity is a better choice than molecular weight in protein sequence comparison. Next, using the polarity property, phylogenetic trees are obtained for 12 baculovirus and 24 transferrin proteins. The results are compared with those obtained earlier on the identical sequences by other methods. Three assessment criteria are considered for comparison of the results—quality based on rationalized perception, quantitative measures based on symmetric distance, and computational speed. In all the cases, the results are found to be more satisfactory. American Chemical Society 2022-10-17 /pmc/articles/PMC9631895/ /pubmed/36340165 http://dx.doi.org/10.1021/acsomega.2c06103 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Pal, Jayanta
Ghosh, Soumen
Maji, Bansibadan
Bhattacharya, Dilip Kumar
Mathematical Approach to Protein Sequence Comparison Based on Physiochemical Properties
title Mathematical Approach to Protein Sequence Comparison Based on Physiochemical Properties
title_full Mathematical Approach to Protein Sequence Comparison Based on Physiochemical Properties
title_fullStr Mathematical Approach to Protein Sequence Comparison Based on Physiochemical Properties
title_full_unstemmed Mathematical Approach to Protein Sequence Comparison Based on Physiochemical Properties
title_short Mathematical Approach to Protein Sequence Comparison Based on Physiochemical Properties
title_sort mathematical approach to protein sequence comparison based on physiochemical properties
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9631895/
https://www.ncbi.nlm.nih.gov/pubmed/36340165
http://dx.doi.org/10.1021/acsomega.2c06103
work_keys_str_mv AT paljayanta mathematicalapproachtoproteinsequencecomparisonbasedonphysiochemicalproperties
AT ghoshsoumen mathematicalapproachtoproteinsequencecomparisonbasedonphysiochemicalproperties
AT majibansibadan mathematicalapproachtoproteinsequencecomparisonbasedonphysiochemicalproperties
AT bhattacharyadilipkumar mathematicalapproachtoproteinsequencecomparisonbasedonphysiochemicalproperties