Cargando…
A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance
Classification of DNA sequences is an important issue in the bioinformatics study, yet most existing methods for phylogenetic analysis including Multiple Sequence Alignment (MSA) are time-consuming and computationally expensive. The alignment-free methods are popular nowadays, whereas the manual int...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6465635/ https://www.ncbi.nlm.nih.gov/pubmed/31024610 http://dx.doi.org/10.3389/fgene.2019.00234 |
_version_ | 1783410968237178880 |
---|---|
author | Dong, Rui He, Lily He, Rong Lucy Yau, Stephen S.-T. |
author_facet | Dong, Rui He, Lily He, Rong Lucy Yau, Stephen S.-T. |
author_sort | Dong, Rui |
collection | PubMed |
description | Classification of DNA sequences is an important issue in the bioinformatics study, yet most existing methods for phylogenetic analysis including Multiple Sequence Alignment (MSA) are time-consuming and computationally expensive. The alignment-free methods are popular nowadays, whereas the manual intervention in those methods usually decreases the accuracy. Also, the interactions among nucleotides are neglected in most methods. Here we propose a new Accumulated Natural Vector (ANV) method which represents each DNA sequence by a point in ℝ(18). By calculating the Accumulated Indicator Functions of nucleotides, we can further find an Accumulated Natural Vector for each sequence. This new Accumulated Natural Vector not only can capture the distribution of each nucleotide, but also provide the covariance among nucleotides. Thus global comparison of DNA sequences or genomes can be done easily in ℝ(18). The tests of ANV of datasets of different sizes and types have proved the accuracy and time-efficiency of the new proposed ANV method. |
format | Online Article Text |
id | pubmed-6465635 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-64656352019-04-25 A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance Dong, Rui He, Lily He, Rong Lucy Yau, Stephen S.-T. Front Genet Genetics Classification of DNA sequences is an important issue in the bioinformatics study, yet most existing methods for phylogenetic analysis including Multiple Sequence Alignment (MSA) are time-consuming and computationally expensive. The alignment-free methods are popular nowadays, whereas the manual intervention in those methods usually decreases the accuracy. Also, the interactions among nucleotides are neglected in most methods. Here we propose a new Accumulated Natural Vector (ANV) method which represents each DNA sequence by a point in ℝ(18). By calculating the Accumulated Indicator Functions of nucleotides, we can further find an Accumulated Natural Vector for each sequence. This new Accumulated Natural Vector not only can capture the distribution of each nucleotide, but also provide the covariance among nucleotides. Thus global comparison of DNA sequences or genomes can be done easily in ℝ(18). The tests of ANV of datasets of different sizes and types have proved the accuracy and time-efficiency of the new proposed ANV method. Frontiers Media S.A. 2019-04-09 /pmc/articles/PMC6465635/ /pubmed/31024610 http://dx.doi.org/10.3389/fgene.2019.00234 Text en Copyright © 2019 Dong, He, He and Yau. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Dong, Rui He, Lily He, Rong Lucy Yau, Stephen S.-T. A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance |
title | A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance |
title_full | A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance |
title_fullStr | A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance |
title_full_unstemmed | A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance |
title_short | A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance |
title_sort | novel approach to clustering genome sequences using inter-nucleotide covariance |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6465635/ https://www.ncbi.nlm.nih.gov/pubmed/31024610 http://dx.doi.org/10.3389/fgene.2019.00234 |
work_keys_str_mv | AT dongrui anovelapproachtoclusteringgenomesequencesusinginternucleotidecovariance AT helily anovelapproachtoclusteringgenomesequencesusinginternucleotidecovariance AT heronglucy anovelapproachtoclusteringgenomesequencesusinginternucleotidecovariance AT yaustephenst anovelapproachtoclusteringgenomesequencesusinginternucleotidecovariance AT dongrui novelapproachtoclusteringgenomesequencesusinginternucleotidecovariance AT helily novelapproachtoclusteringgenomesequencesusinginternucleotidecovariance AT heronglucy novelapproachtoclusteringgenomesequencesusinginternucleotidecovariance AT yaustephenst novelapproachtoclusteringgenomesequencesusinginternucleotidecovariance |