Cargando…
Mathematical Approach to Protein Sequence Comparison Based on Physiochemical Properties
[Image: see text] The difficult aspect of developing new protein sequence comparison techniques is coming up with a method that can quickly and effectively handle huge data sets of various lengths in a timely manner. In this work, we first obtain two numerical representations of protein sequences se...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2022
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9631895/ https://www.ncbi.nlm.nih.gov/pubmed/36340165 http://dx.doi.org/10.1021/acsomega.2c06103 |
_version_ | 1784823912590213120 |
---|---|
author | Pal, Jayanta Ghosh, Soumen Maji, Bansibadan Bhattacharya, Dilip Kumar |
author_facet | Pal, Jayanta Ghosh, Soumen Maji, Bansibadan Bhattacharya, Dilip Kumar |
author_sort | Pal, Jayanta |
collection | PubMed |
description | [Image: see text] The difficult aspect of developing new protein sequence comparison techniques is coming up with a method that can quickly and effectively handle huge data sets of various lengths in a timely manner. In this work, we first obtain two numerical representations of protein sequences separately based on one physical property and one chemical property of amino acids. The lengths of all the sequences under comparison are made equal by appending the required number of zeroes. Then, fast Fourier transform is applied to this numerical time series to obtain the corresponding spectrum. Next, the spectrum values are reduced by the standard inter coefficient difference method. Finally, the corresponding normalized values of the reduced spectrum are selected as the descriptors for protein sequence comparison. Using these descriptors, the distance matrices are obtained using Euclidian distance. They are subsequently used to draw the phylogenetic trees using the UPGMA algorithm. Phylogenetic trees are first constructed for 9 ND4, 9 ND5, and 9 ND6 proteins using the polarity value as the chemical property and the molecular weight as the physical property. They are compared, and it is seen that polarity is a better choice than molecular weight in protein sequence comparison. Next, using the polarity property, phylogenetic trees are obtained for 12 baculovirus and 24 transferrin proteins. The results are compared with those obtained earlier on the identical sequences by other methods. Three assessment criteria are considered for comparison of the results—quality based on rationalized perception, quantitative measures based on symmetric distance, and computational speed. In all the cases, the results are found to be more satisfactory. |
format | Online Article Text |
id | pubmed-9631895 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-96318952022-11-04 Mathematical Approach to Protein Sequence Comparison Based on Physiochemical Properties Pal, Jayanta Ghosh, Soumen Maji, Bansibadan Bhattacharya, Dilip Kumar ACS Omega [Image: see text] The difficult aspect of developing new protein sequence comparison techniques is coming up with a method that can quickly and effectively handle huge data sets of various lengths in a timely manner. In this work, we first obtain two numerical representations of protein sequences separately based on one physical property and one chemical property of amino acids. The lengths of all the sequences under comparison are made equal by appending the required number of zeroes. Then, fast Fourier transform is applied to this numerical time series to obtain the corresponding spectrum. Next, the spectrum values are reduced by the standard inter coefficient difference method. Finally, the corresponding normalized values of the reduced spectrum are selected as the descriptors for protein sequence comparison. Using these descriptors, the distance matrices are obtained using Euclidian distance. They are subsequently used to draw the phylogenetic trees using the UPGMA algorithm. Phylogenetic trees are first constructed for 9 ND4, 9 ND5, and 9 ND6 proteins using the polarity value as the chemical property and the molecular weight as the physical property. They are compared, and it is seen that polarity is a better choice than molecular weight in protein sequence comparison. Next, using the polarity property, phylogenetic trees are obtained for 12 baculovirus and 24 transferrin proteins. The results are compared with those obtained earlier on the identical sequences by other methods. Three assessment criteria are considered for comparison of the results—quality based on rationalized perception, quantitative measures based on symmetric distance, and computational speed. In all the cases, the results are found to be more satisfactory. American Chemical Society 2022-10-17 /pmc/articles/PMC9631895/ /pubmed/36340165 http://dx.doi.org/10.1021/acsomega.2c06103 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Pal, Jayanta Ghosh, Soumen Maji, Bansibadan Bhattacharya, Dilip Kumar Mathematical Approach to Protein Sequence Comparison Based on Physiochemical Properties |
title | Mathematical Approach
to Protein Sequence Comparison
Based on Physiochemical Properties |
title_full | Mathematical Approach
to Protein Sequence Comparison
Based on Physiochemical Properties |
title_fullStr | Mathematical Approach
to Protein Sequence Comparison
Based on Physiochemical Properties |
title_full_unstemmed | Mathematical Approach
to Protein Sequence Comparison
Based on Physiochemical Properties |
title_short | Mathematical Approach
to Protein Sequence Comparison
Based on Physiochemical Properties |
title_sort | mathematical approach
to protein sequence comparison
based on physiochemical properties |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9631895/ https://www.ncbi.nlm.nih.gov/pubmed/36340165 http://dx.doi.org/10.1021/acsomega.2c06103 |
work_keys_str_mv | AT paljayanta mathematicalapproachtoproteinsequencecomparisonbasedonphysiochemicalproperties AT ghoshsoumen mathematicalapproachtoproteinsequencecomparisonbasedonphysiochemicalproperties AT majibansibadan mathematicalapproachtoproteinsequencecomparisonbasedonphysiochemicalproperties AT bhattacharyadilipkumar mathematicalapproachtoproteinsequencecomparisonbasedonphysiochemicalproperties |