Cargando…

PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction

Phylogenetic tree is essential to understand evolution and it is usually constructed through multiple sequence alignment, which suffers from heavy computational burdens and requires sophisticated parameter tuning. Recently, alignment free methods based on k-mer profiles or common substrings provide...

Descripción completa

Detalles Bibliográficos
Autores principales: Kang, Yongyong, Yang, Xiaofei, Lin, Jiadong, Ye, Kai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6410268/
https://www.ncbi.nlm.nih.gov/pubmed/30678245
http://dx.doi.org/10.3390/genes10020073
_version_ 1783402207611191296
author Kang, Yongyong
Yang, Xiaofei
Lin, Jiadong
Ye, Kai
author_facet Kang, Yongyong
Yang, Xiaofei
Lin, Jiadong
Ye, Kai
author_sort Kang, Yongyong
collection PubMed
description Phylogenetic tree is essential to understand evolution and it is usually constructed through multiple sequence alignment, which suffers from heavy computational burdens and requires sophisticated parameter tuning. Recently, alignment free methods based on k-mer profiles or common substrings provide alternative ways to construct phylogenetic trees. However, most of these methods ignore the global similarities between sequences or some specific valuable features, e.g., frequent patterns overall datasets. To make further improvement, we propose an alignment free algorithm based on sequential pattern mining, where each sequence is converted into a binary representation of sequential patterns among sequences. The phylogenetic tree is further constructed via clustering distance matrix which is calculated from pattern vectors. To increase accuracy for highly divergent sequences, we consider pattern weight and filtering redundancy sub-patterns. Both simulated and real data demonstrates our method outperform other alignment free methods, especially for large sequence set with low similarity.
format Online
Article
Text
id pubmed-6410268
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-64102682019-03-26 PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction Kang, Yongyong Yang, Xiaofei Lin, Jiadong Ye, Kai Genes (Basel) Article Phylogenetic tree is essential to understand evolution and it is usually constructed through multiple sequence alignment, which suffers from heavy computational burdens and requires sophisticated parameter tuning. Recently, alignment free methods based on k-mer profiles or common substrings provide alternative ways to construct phylogenetic trees. However, most of these methods ignore the global similarities between sequences or some specific valuable features, e.g., frequent patterns overall datasets. To make further improvement, we propose an alignment free algorithm based on sequential pattern mining, where each sequence is converted into a binary representation of sequential patterns among sequences. The phylogenetic tree is further constructed via clustering distance matrix which is calculated from pattern vectors. To increase accuracy for highly divergent sequences, we consider pattern weight and filtering redundancy sub-patterns. Both simulated and real data demonstrates our method outperform other alignment free methods, especially for large sequence set with low similarity. MDPI 2019-01-22 /pmc/articles/PMC6410268/ /pubmed/30678245 http://dx.doi.org/10.3390/genes10020073 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kang, Yongyong
Yang, Xiaofei
Lin, Jiadong
Ye, Kai
PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction
title PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction
title_full PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction
title_fullStr PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction
title_full_unstemmed PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction
title_short PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction
title_sort pvtree: a sequential pattern mining method for alignment independent phylogeny reconstruction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6410268/
https://www.ncbi.nlm.nih.gov/pubmed/30678245
http://dx.doi.org/10.3390/genes10020073
work_keys_str_mv AT kangyongyong pvtreeasequentialpatternminingmethodforalignmentindependentphylogenyreconstruction
AT yangxiaofei pvtreeasequentialpatternminingmethodforalignmentindependentphylogenyreconstruction
AT linjiadong pvtreeasequentialpatternminingmethodforalignmentindependentphylogenyreconstruction
AT yekai pvtreeasequentialpatternminingmethodforalignmentindependentphylogenyreconstruction