Cargando…

A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector

Similarity/dissimilarity analysis is a key way of understanding the biology of an organism by knowing the origin of the new genes/sequences. Sequence data are grouped in terms of biological relationships. The number of sequences related to any group is susceptible to be increased every day. All the...

Descripción completa

Detalles Bibliográficos
Autores principales: Abd Elwahaab, Marwa A., Abo-Elkhier, Mervat M., Abo el Maaty, Moheb I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6530227/
https://www.ncbi.nlm.nih.gov/pubmed/31205946
http://dx.doi.org/10.1155/2019/8702968
_version_ 1783420588167004160
author Abd Elwahaab, Marwa A.
Abo-Elkhier, Mervat M.
Abo el Maaty, Moheb I.
author_facet Abd Elwahaab, Marwa A.
Abo-Elkhier, Mervat M.
Abo el Maaty, Moheb I.
author_sort Abd Elwahaab, Marwa A.
collection PubMed
description Similarity/dissimilarity analysis is a key way of understanding the biology of an organism by knowing the origin of the new genes/sequences. Sequence data are grouped in terms of biological relationships. The number of sequences related to any group is susceptible to be increased every day. All the present alignment-free methods approve the utility of their approaches by producing a similarity/dissimilarity matrix. Although this matrix is clear, it measures the degree of similarity among sequences individually. In our work, a representative of each of three groups of protein sequences is introduced. A similarity/dissimilarity vector is evaluated instead of the ordinary similarity/dissimilarity matrix based on the group representative. The approach is applied on three selected groups of protein sequences: beta globin, NADH dehydrogenase subunit 5 (ND5), and spike protein sequences. A cross-grouping comparison is produced to ensure the singularity of each group. A qualitative comparison between our approach, previous articles, and the phylogenetic tree of these protein sequences proved the utility of our approach.
format Online
Article
Text
id pubmed-6530227
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-65302272019-06-16 A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector Abd Elwahaab, Marwa A. Abo-Elkhier, Mervat M. Abo el Maaty, Moheb I. Biomed Res Int Research Article Similarity/dissimilarity analysis is a key way of understanding the biology of an organism by knowing the origin of the new genes/sequences. Sequence data are grouped in terms of biological relationships. The number of sequences related to any group is susceptible to be increased every day. All the present alignment-free methods approve the utility of their approaches by producing a similarity/dissimilarity matrix. Although this matrix is clear, it measures the degree of similarity among sequences individually. In our work, a representative of each of three groups of protein sequences is introduced. A similarity/dissimilarity vector is evaluated instead of the ordinary similarity/dissimilarity matrix based on the group representative. The approach is applied on three selected groups of protein sequences: beta globin, NADH dehydrogenase subunit 5 (ND5), and spike protein sequences. A cross-grouping comparison is produced to ensure the singularity of each group. A qualitative comparison between our approach, previous articles, and the phylogenetic tree of these protein sequences proved the utility of our approach. Hindawi 2019-05-08 /pmc/articles/PMC6530227/ /pubmed/31205946 http://dx.doi.org/10.1155/2019/8702968 Text en Copyright © 2019 Marwa A. Abd Elwahaab et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Abd Elwahaab, Marwa A.
Abo-Elkhier, Mervat M.
Abo el Maaty, Moheb I.
A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector
title A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector
title_full A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector
title_fullStr A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector
title_full_unstemmed A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector
title_short A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector
title_sort statistical similarity/dissimilarity analysis of protein sequences based on a novel group representative vector
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6530227/
https://www.ncbi.nlm.nih.gov/pubmed/31205946
http://dx.doi.org/10.1155/2019/8702968
work_keys_str_mv AT abdelwahaabmarwaa astatisticalsimilaritydissimilarityanalysisofproteinsequencesbasedonanovelgrouprepresentativevector
AT aboelkhiermervatm astatisticalsimilaritydissimilarityanalysisofproteinsequencesbasedonanovelgrouprepresentativevector
AT aboelmaatymohebi astatisticalsimilaritydissimilarityanalysisofproteinsequencesbasedonanovelgrouprepresentativevector
AT abdelwahaabmarwaa statisticalsimilaritydissimilarityanalysisofproteinsequencesbasedonanovelgrouprepresentativevector
AT aboelkhiermervatm statisticalsimilaritydissimilarityanalysisofproteinsequencesbasedonanovelgrouprepresentativevector
AT aboelmaatymohebi statisticalsimilaritydissimilarityanalysisofproteinsequencesbasedonanovelgrouprepresentativevector