Cargando…

An invariants-based method for efficient identification of hybrid species from large-scale genomic data

BACKGROUND: Coalescent-based species tree inference has become widely used in the analysis of genome-scale multilocus and SNP datasets when the goal is inference of a species-level phylogeny. However, numerous evolutionary processes are known to violate the assumptions of a coalescence-only model an...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kubatko, Laura S., Chifman, Julia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6543680/ https://www.ncbi.nlm.nih.gov/pubmed/31146685 http://dx.doi.org/10.1186/s12862-019-1439-7

_version_	1783423129570246656
author	Kubatko, Laura S. Chifman, Julia
author_facet	Kubatko, Laura S. Chifman, Julia
author_sort	Kubatko, Laura S.
collection	PubMed
description	BACKGROUND: Coalescent-based species tree inference has become widely used in the analysis of genome-scale multilocus and SNP datasets when the goal is inference of a species-level phylogeny. However, numerous evolutionary processes are known to violate the assumptions of a coalescence-only model and complicate inference of the species tree. One such process is hybrid speciation, in which a species shares its ancestry with two distinct species. Although many methods have been proposed to detect hybrid speciation, only a few have considered both hybridization and coalescence in a unified framework, and these are generally limited to the setting in which putative hybrid species must be identified in advance. RESULTS: Here we propose a method that can examine genome-scale data for a large number of taxa and detect those taxa that may have arisen via hybridization, as well as their potential “parental” taxa. The method is based on a model that considers both coalescence and hybridization together, and uses phylogenetic invariants to construct a test that scales well in terms of computational time for both the number of taxa and the amount of sequence data. We test the method using simulated data for up 20 taxa and 100,000bp, and find that the method accurately identifies both recent and ancient hybrid species in less than 30 s. We apply the method to two empirical datasets, one composed of Sistrurus rattlesnakes for which hybrid speciation is not supported by previous work, and one consisting of several species of Heliconius butterflies for which some evidence of hybrid speciation has been previously found. CONCLUSIONS: The proposed method is powerful for detecting hybridization for both recent and ancient hybridization events. The computations required can be carried out rapidly for a large number of sequences using genome-scale data, and the method is appropriate for both SNP and multilocus data.
format	Online Article Text
id	pubmed-6543680
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-65436802019-06-04 An invariants-based method for efficient identification of hybrid species from large-scale genomic data Kubatko, Laura S. Chifman, Julia BMC Evol Biol Methodology Article BACKGROUND: Coalescent-based species tree inference has become widely used in the analysis of genome-scale multilocus and SNP datasets when the goal is inference of a species-level phylogeny. However, numerous evolutionary processes are known to violate the assumptions of a coalescence-only model and complicate inference of the species tree. One such process is hybrid speciation, in which a species shares its ancestry with two distinct species. Although many methods have been proposed to detect hybrid speciation, only a few have considered both hybridization and coalescence in a unified framework, and these are generally limited to the setting in which putative hybrid species must be identified in advance. RESULTS: Here we propose a method that can examine genome-scale data for a large number of taxa and detect those taxa that may have arisen via hybridization, as well as their potential “parental” taxa. The method is based on a model that considers both coalescence and hybridization together, and uses phylogenetic invariants to construct a test that scales well in terms of computational time for both the number of taxa and the amount of sequence data. We test the method using simulated data for up 20 taxa and 100,000bp, and find that the method accurately identifies both recent and ancient hybrid species in less than 30 s. We apply the method to two empirical datasets, one composed of Sistrurus rattlesnakes for which hybrid speciation is not supported by previous work, and one consisting of several species of Heliconius butterflies for which some evidence of hybrid speciation has been previously found. CONCLUSIONS: The proposed method is powerful for detecting hybridization for both recent and ancient hybridization events. The computations required can be carried out rapidly for a large number of sequences using genome-scale data, and the method is appropriate for both SNP and multilocus data. BioMed Central 2019-05-30 /pmc/articles/PMC6543680/ /pubmed/31146685 http://dx.doi.org/10.1186/s12862-019-1439-7 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Kubatko, Laura S. Chifman, Julia An invariants-based method for efficient identification of hybrid species from large-scale genomic data
title	An invariants-based method for efficient identification of hybrid species from large-scale genomic data
title_full	An invariants-based method for efficient identification of hybrid species from large-scale genomic data
title_fullStr	An invariants-based method for efficient identification of hybrid species from large-scale genomic data
title_full_unstemmed	An invariants-based method for efficient identification of hybrid species from large-scale genomic data
title_short	An invariants-based method for efficient identification of hybrid species from large-scale genomic data
title_sort	invariants-based method for efficient identification of hybrid species from large-scale genomic data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6543680/ https://www.ncbi.nlm.nih.gov/pubmed/31146685 http://dx.doi.org/10.1186/s12862-019-1439-7
work_keys_str_mv	AT kubatkolauras aninvariantsbasedmethodforefficientidentificationofhybridspeciesfromlargescalegenomicdata AT chifmanjulia aninvariantsbasedmethodforefficientidentificationofhybridspeciesfromlargescalegenomicdata AT kubatkolauras invariantsbasedmethodforefficientidentificationofhybridspeciesfromlargescalegenomicdata AT chifmanjulia invariantsbasedmethodforefficientidentificationofhybridspeciesfromlargescalegenomicdata

An invariants-based method for efficient identification of hybrid species from large-scale genomic data

Ejemplares similares