Cargando…

A greedy alignment-free distance estimator for phylogenetic inference

BACKGROUND: Alignment-free sequence comparison approaches have been garnering increasing interest in various data- and compute-intensive applications such as phylogenetic inference for large-scale sequences. While k-mer based methods are predominantly used in real applications, the average common su...

Descripción completa

Detalles Bibliográficos
Autores principales: Thankachan, Sharma V., Chockalingam, Sriram P., Liu, Yongchao, Krishnan, Ambujam, Aluru, Srinivas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5471951/
https://www.ncbi.nlm.nih.gov/pubmed/28617225
http://dx.doi.org/10.1186/s12859-017-1658-0
_version_ 1783244052317077504
author Thankachan, Sharma V.
Chockalingam, Sriram P.
Liu, Yongchao
Krishnan, Ambujam
Aluru, Srinivas
author_facet Thankachan, Sharma V.
Chockalingam, Sriram P.
Liu, Yongchao
Krishnan, Ambujam
Aluru, Srinivas
author_sort Thankachan, Sharma V.
collection PubMed
description BACKGROUND: Alignment-free sequence comparison approaches have been garnering increasing interest in various data- and compute-intensive applications such as phylogenetic inference for large-scale sequences. While k-mer based methods are predominantly used in real applications, the average common substring (ACS) approach is emerging as one of the prominent alignment-free approaches. This ACS approach has been further generalized by some recent work, either greedily or exactly, by allowing a bounded number of mismatches in the common substrings. RESULTS: We present ALFRED-G, a greedy alignment-free distance estimator for phylogenetic tree reconstruction based on the concept of the generalized ACS approach. In this algorithm, we have investigated a new heuristic to efficiently compute the lengths of common strings with mismatches allowed, and have further applied this heuristic to phylogeny reconstruction. Performance evaluation using real sequence datasets shows that our heuristic is able to reconstruct comparable, or even more accurate, phylogenetic tree topologies than the kmacs heuristic algorithm at highly competitive speed. CONCLUSIONS: ALFRED-G is an alignment-free heuristic for evolutionary distance estimation between two biological sequences. This algorithm is implemented in C++ and has been incorporated into our open-source ALFRED software package (http://alurulab.cc.gatech.edu/phylo).
format Online
Article
Text
id pubmed-5471951
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54719512017-06-19 A greedy alignment-free distance estimator for phylogenetic inference Thankachan, Sharma V. Chockalingam, Sriram P. Liu, Yongchao Krishnan, Ambujam Aluru, Srinivas BMC Bioinformatics Research BACKGROUND: Alignment-free sequence comparison approaches have been garnering increasing interest in various data- and compute-intensive applications such as phylogenetic inference for large-scale sequences. While k-mer based methods are predominantly used in real applications, the average common substring (ACS) approach is emerging as one of the prominent alignment-free approaches. This ACS approach has been further generalized by some recent work, either greedily or exactly, by allowing a bounded number of mismatches in the common substrings. RESULTS: We present ALFRED-G, a greedy alignment-free distance estimator for phylogenetic tree reconstruction based on the concept of the generalized ACS approach. In this algorithm, we have investigated a new heuristic to efficiently compute the lengths of common strings with mismatches allowed, and have further applied this heuristic to phylogeny reconstruction. Performance evaluation using real sequence datasets shows that our heuristic is able to reconstruct comparable, or even more accurate, phylogenetic tree topologies than the kmacs heuristic algorithm at highly competitive speed. CONCLUSIONS: ALFRED-G is an alignment-free heuristic for evolutionary distance estimation between two biological sequences. This algorithm is implemented in C++ and has been incorporated into our open-source ALFRED software package (http://alurulab.cc.gatech.edu/phylo). BioMed Central 2017-06-07 /pmc/articles/PMC5471951/ /pubmed/28617225 http://dx.doi.org/10.1186/s12859-017-1658-0 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Thankachan, Sharma V.
Chockalingam, Sriram P.
Liu, Yongchao
Krishnan, Ambujam
Aluru, Srinivas
A greedy alignment-free distance estimator for phylogenetic inference
title A greedy alignment-free distance estimator for phylogenetic inference
title_full A greedy alignment-free distance estimator for phylogenetic inference
title_fullStr A greedy alignment-free distance estimator for phylogenetic inference
title_full_unstemmed A greedy alignment-free distance estimator for phylogenetic inference
title_short A greedy alignment-free distance estimator for phylogenetic inference
title_sort greedy alignment-free distance estimator for phylogenetic inference
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5471951/
https://www.ncbi.nlm.nih.gov/pubmed/28617225
http://dx.doi.org/10.1186/s12859-017-1658-0
work_keys_str_mv AT thankachansharmav agreedyalignmentfreedistanceestimatorforphylogeneticinference
AT chockalingamsriramp agreedyalignmentfreedistanceestimatorforphylogeneticinference
AT liuyongchao agreedyalignmentfreedistanceestimatorforphylogeneticinference
AT krishnanambujam agreedyalignmentfreedistanceestimatorforphylogeneticinference
AT alurusrinivas agreedyalignmentfreedistanceestimatorforphylogeneticinference
AT thankachansharmav greedyalignmentfreedistanceestimatorforphylogeneticinference
AT chockalingamsriramp greedyalignmentfreedistanceestimatorforphylogeneticinference
AT liuyongchao greedyalignmentfreedistanceestimatorforphylogeneticinference
AT krishnanambujam greedyalignmentfreedistanceestimatorforphylogeneticinference
AT alurusrinivas greedyalignmentfreedistanceestimatorforphylogeneticinference