Cargando…

A Novel Model for DNA Sequence Similarity Analysis Based on Graph Theory

Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to d...

Descripción completa

Detalles Bibliográficos
Autores principales: Qi, Xingqin, Wu, Qin, Zhang, Yusen, Fuller, Eddie, Zhang, Cun-Quan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3204935/
https://www.ncbi.nlm.nih.gov/pubmed/22065497
http://dx.doi.org/10.4137/EBO.S7364
_version_ 1782215264580927488
author Qi, Xingqin
Wu, Qin
Zhang, Yusen
Fuller, Eddie
Zhang, Cun-Quan
author_facet Qi, Xingqin
Wu, Qin
Zhang, Yusen
Fuller, Eddie
Zhang, Cun-Quan
author_sort Qi, Xingqin
collection PubMed
description Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similarity analysis such that various mutation phenomena information would be involved simultaneously. In this paper, different from traditional methods (eg, nucleotide frequency, geometric representations) as bases for construction of mathematical descriptors, we construct novel mathematical descriptors based on graph theory. In particular, for each DNA sequence, we will set up a weighted directed graph. The adjacency matrix of the directed graph will be used to induce a representative vector for DNA sequence. This new approach measures similarity based on both ordering and frequency of nucleotides so that much more information is involved. As an application, the method is tested on a set of 0.9-kb mtDNA sequences of twelve different primate species. All output phylogenetic trees with various distance estimations have the same topology, and are generally consistent with the reported results from early studies, which proves the new method’s efficiency; we also test the new method on a simulated data set, which shows our new method performs better than traditional global alignment method when subsequent rearrangements happen frequently during evolutionary history.
format Online
Article
Text
id pubmed-3204935
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-32049352011-11-04 A Novel Model for DNA Sequence Similarity Analysis Based on Graph Theory Qi, Xingqin Wu, Qin Zhang, Yusen Fuller, Eddie Zhang, Cun-Quan Evol Bioinform Online Original Research Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similarity analysis such that various mutation phenomena information would be involved simultaneously. In this paper, different from traditional methods (eg, nucleotide frequency, geometric representations) as bases for construction of mathematical descriptors, we construct novel mathematical descriptors based on graph theory. In particular, for each DNA sequence, we will set up a weighted directed graph. The adjacency matrix of the directed graph will be used to induce a representative vector for DNA sequence. This new approach measures similarity based on both ordering and frequency of nucleotides so that much more information is involved. As an application, the method is tested on a set of 0.9-kb mtDNA sequences of twelve different primate species. All output phylogenetic trees with various distance estimations have the same topology, and are generally consistent with the reported results from early studies, which proves the new method’s efficiency; we also test the new method on a simulated data set, which shows our new method performs better than traditional global alignment method when subsequent rearrangements happen frequently during evolutionary history. Libertas Academica 2011-10-04 /pmc/articles/PMC3204935/ /pubmed/22065497 http://dx.doi.org/10.4137/EBO.S7364 Text en © the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited.
spellingShingle Original Research
Qi, Xingqin
Wu, Qin
Zhang, Yusen
Fuller, Eddie
Zhang, Cun-Quan
A Novel Model for DNA Sequence Similarity Analysis Based on Graph Theory
title A Novel Model for DNA Sequence Similarity Analysis Based on Graph Theory
title_full A Novel Model for DNA Sequence Similarity Analysis Based on Graph Theory
title_fullStr A Novel Model for DNA Sequence Similarity Analysis Based on Graph Theory
title_full_unstemmed A Novel Model for DNA Sequence Similarity Analysis Based on Graph Theory
title_short A Novel Model for DNA Sequence Similarity Analysis Based on Graph Theory
title_sort novel model for dna sequence similarity analysis based on graph theory
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3204935/
https://www.ncbi.nlm.nih.gov/pubmed/22065497
http://dx.doi.org/10.4137/EBO.S7364
work_keys_str_mv AT qixingqin anovelmodelfordnasequencesimilarityanalysisbasedongraphtheory
AT wuqin anovelmodelfordnasequencesimilarityanalysisbasedongraphtheory
AT zhangyusen anovelmodelfordnasequencesimilarityanalysisbasedongraphtheory
AT fullereddie anovelmodelfordnasequencesimilarityanalysisbasedongraphtheory
AT zhangcunquan anovelmodelfordnasequencesimilarityanalysisbasedongraphtheory
AT qixingqin novelmodelfordnasequencesimilarityanalysisbasedongraphtheory
AT wuqin novelmodelfordnasequencesimilarityanalysisbasedongraphtheory
AT zhangyusen novelmodelfordnasequencesimilarityanalysisbasedongraphtheory
AT fullereddie novelmodelfordnasequencesimilarityanalysisbasedongraphtheory
AT zhangcunquan novelmodelfordnasequencesimilarityanalysisbasedongraphtheory