Cargando…

A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance

Classification of DNA sequences is an important issue in the bioinformatics study, yet most existing methods for phylogenetic analysis including Multiple Sequence Alignment (MSA) are time-consuming and computationally expensive. The alignment-free methods are popular nowadays, whereas the manual int...

Descripción completa

Detalles Bibliográficos
Autores principales: Dong, Rui, He, Lily, He, Rong Lucy, Yau, Stephen S.-T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6465635/
https://www.ncbi.nlm.nih.gov/pubmed/31024610
http://dx.doi.org/10.3389/fgene.2019.00234
_version_ 1783410968237178880
author Dong, Rui
He, Lily
He, Rong Lucy
Yau, Stephen S.-T.
author_facet Dong, Rui
He, Lily
He, Rong Lucy
Yau, Stephen S.-T.
author_sort Dong, Rui
collection PubMed
description Classification of DNA sequences is an important issue in the bioinformatics study, yet most existing methods for phylogenetic analysis including Multiple Sequence Alignment (MSA) are time-consuming and computationally expensive. The alignment-free methods are popular nowadays, whereas the manual intervention in those methods usually decreases the accuracy. Also, the interactions among nucleotides are neglected in most methods. Here we propose a new Accumulated Natural Vector (ANV) method which represents each DNA sequence by a point in ℝ(18). By calculating the Accumulated Indicator Functions of nucleotides, we can further find an Accumulated Natural Vector for each sequence. This new Accumulated Natural Vector not only can capture the distribution of each nucleotide, but also provide the covariance among nucleotides. Thus global comparison of DNA sequences or genomes can be done easily in ℝ(18). The tests of ANV of datasets of different sizes and types have proved the accuracy and time-efficiency of the new proposed ANV method.
format Online
Article
Text
id pubmed-6465635
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-64656352019-04-25 A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance Dong, Rui He, Lily He, Rong Lucy Yau, Stephen S.-T. Front Genet Genetics Classification of DNA sequences is an important issue in the bioinformatics study, yet most existing methods for phylogenetic analysis including Multiple Sequence Alignment (MSA) are time-consuming and computationally expensive. The alignment-free methods are popular nowadays, whereas the manual intervention in those methods usually decreases the accuracy. Also, the interactions among nucleotides are neglected in most methods. Here we propose a new Accumulated Natural Vector (ANV) method which represents each DNA sequence by a point in ℝ(18). By calculating the Accumulated Indicator Functions of nucleotides, we can further find an Accumulated Natural Vector for each sequence. This new Accumulated Natural Vector not only can capture the distribution of each nucleotide, but also provide the covariance among nucleotides. Thus global comparison of DNA sequences or genomes can be done easily in ℝ(18). The tests of ANV of datasets of different sizes and types have proved the accuracy and time-efficiency of the new proposed ANV method. Frontiers Media S.A. 2019-04-09 /pmc/articles/PMC6465635/ /pubmed/31024610 http://dx.doi.org/10.3389/fgene.2019.00234 Text en Copyright © 2019 Dong, He, He and Yau. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Dong, Rui
He, Lily
He, Rong Lucy
Yau, Stephen S.-T.
A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance
title A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance
title_full A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance
title_fullStr A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance
title_full_unstemmed A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance
title_short A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance
title_sort novel approach to clustering genome sequences using inter-nucleotide covariance
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6465635/
https://www.ncbi.nlm.nih.gov/pubmed/31024610
http://dx.doi.org/10.3389/fgene.2019.00234
work_keys_str_mv AT dongrui anovelapproachtoclusteringgenomesequencesusinginternucleotidecovariance
AT helily anovelapproachtoclusteringgenomesequencesusinginternucleotidecovariance
AT heronglucy anovelapproachtoclusteringgenomesequencesusinginternucleotidecovariance
AT yaustephenst anovelapproachtoclusteringgenomesequencesusinginternucleotidecovariance
AT dongrui novelapproachtoclusteringgenomesequencesusinginternucleotidecovariance
AT helily novelapproachtoclusteringgenomesequencesusinginternucleotidecovariance
AT heronglucy novelapproachtoclusteringgenomesequencesusinginternucleotidecovariance
AT yaustephenst novelapproachtoclusteringgenomesequencesusinginternucleotidecovariance