Cargando…

LS(X): automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference

BACKGROUND: Lineage rate heterogeneity can be a major source of bias, especially in multi-gene phylogeny inference. We had previously tackled this issue by developing LS(3), a data subselection algorithm that, by removing fast-evolving sequences in a gene-specific manner, identifies subsets of seque...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rivera-Rivera, Carlos J., Montoya-Burgos, Juan I.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6693147/ https://www.ncbi.nlm.nih.gov/pubmed/31409290 http://dx.doi.org/10.1186/s12859-019-3020-1

_version_	1783443652913135616
author	Rivera-Rivera, Carlos J. Montoya-Burgos, Juan I.
author_facet	Rivera-Rivera, Carlos J. Montoya-Burgos, Juan I.
author_sort	Rivera-Rivera, Carlos J.
collection	PubMed
description	BACKGROUND: Lineage rate heterogeneity can be a major source of bias, especially in multi-gene phylogeny inference. We had previously tackled this issue by developing LS(3), a data subselection algorithm that, by removing fast-evolving sequences in a gene-specific manner, identifies subsets of sequences that evolve at a relatively homogeneous rate. However, this algorithm had two major shortcomings: (i) it was automated and published as a set of bash scripts, and hence was Linux-specific, and not user friendly, and (ii) it could result in very stringent sequence subselection when extremely slow-evolving sequences were present. RESULTS: We address these challenges and produce a new, platform-independent program, LS(X), written in R, which includes a reprogrammed version of the original LS(3) algorithm and has added features to make better lineage rate calculations. In addition, we developed and included an alternative version of the algorithm, LS(4), which reduces lineage rate heterogeneity by detecting sequences that evolve too fast and sequences that evolve too slow, resulting in less stringent data subselection when extremely slow-evolving sequences are present. The efficiency of LS(X) and of LS(4) with datasets with extremely slow-evolving sequences is demonstrated with simulated data, and by the resolution of a contentious node in the catfish phylogeny that was affected by an unusually high lineage rate heterogeneity in the dataset. CONCLUSIONS: LS(X) is a new bioinformatic tool, with an accessible code, and with which the effect of lineage rate heterogeneity can be explored in gene sequence datasets of virtually any size. In addition, the two modalities of the sequence subsampling algorithm included, LS(3) and LS(4), allow the user to optimize the amount of non-phylogenetic signal removed while keeping a maximum of phylogenetic signal. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3020-1) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6693147
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-66931472019-08-16 LS(X): automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference Rivera-Rivera, Carlos J. Montoya-Burgos, Juan I. BMC Bioinformatics Software BACKGROUND: Lineage rate heterogeneity can be a major source of bias, especially in multi-gene phylogeny inference. We had previously tackled this issue by developing LS(3), a data subselection algorithm that, by removing fast-evolving sequences in a gene-specific manner, identifies subsets of sequences that evolve at a relatively homogeneous rate. However, this algorithm had two major shortcomings: (i) it was automated and published as a set of bash scripts, and hence was Linux-specific, and not user friendly, and (ii) it could result in very stringent sequence subselection when extremely slow-evolving sequences were present. RESULTS: We address these challenges and produce a new, platform-independent program, LS(X), written in R, which includes a reprogrammed version of the original LS(3) algorithm and has added features to make better lineage rate calculations. In addition, we developed and included an alternative version of the algorithm, LS(4), which reduces lineage rate heterogeneity by detecting sequences that evolve too fast and sequences that evolve too slow, resulting in less stringent data subselection when extremely slow-evolving sequences are present. The efficiency of LS(X) and of LS(4) with datasets with extremely slow-evolving sequences is demonstrated with simulated data, and by the resolution of a contentious node in the catfish phylogeny that was affected by an unusually high lineage rate heterogeneity in the dataset. CONCLUSIONS: LS(X) is a new bioinformatic tool, with an accessible code, and with which the effect of lineage rate heterogeneity can be explored in gene sequence datasets of virtually any size. In addition, the two modalities of the sequence subsampling algorithm included, LS(3) and LS(4), allow the user to optimize the amount of non-phylogenetic signal removed while keeping a maximum of phylogenetic signal. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3020-1) contains supplementary material, which is available to authorized users. BioMed Central 2019-08-13 /pmc/articles/PMC6693147/ /pubmed/31409290 http://dx.doi.org/10.1186/s12859-019-3020-1 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Rivera-Rivera, Carlos J. Montoya-Burgos, Juan I. LS(X): automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference
title	LS(X): automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference
title_full	LS(X): automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference
title_fullStr	LS(X): automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference
title_full_unstemmed	LS(X): automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference
title_short	LS(X): automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference
title_sort	ls(x): automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6693147/ https://www.ncbi.nlm.nih.gov/pubmed/31409290 http://dx.doi.org/10.1186/s12859-019-3020-1
work_keys_str_mv	AT riverariveracarlosj lsxautomatedreductionofgenespecificlineageevolutionaryrateheterogeneityformultigenephylogenyinference AT montoyaburgosjuani lsxautomatedreductionofgenespecificlineageevolutionaryrateheterogeneityformultigenephylogenyinference

LS(X): automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference

Ejemplares similares