Cargando…

Estimation of Cross-Species Introgression Rates Using Genomic Data Despite Model Unidentifiability

Full-likelihood implementations of the multispecies coalescent with introgression (MSci) model treat genealogical fluctuations across the genome as a major source of information to infer the history of species divergence and gene flow using multilocus sequence data. However, MSci models are known to...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Ziheng, Flouri, Tomáš
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9087891/
https://www.ncbi.nlm.nih.gov/pubmed/35417543
http://dx.doi.org/10.1093/molbev/msac083
_version_ 1784704253290348544
author Yang, Ziheng
Flouri, Tomáš
author_facet Yang, Ziheng
Flouri, Tomáš
author_sort Yang, Ziheng
collection PubMed
description Full-likelihood implementations of the multispecies coalescent with introgression (MSci) model treat genealogical fluctuations across the genome as a major source of information to infer the history of species divergence and gene flow using multilocus sequence data. However, MSci models are known to have unidentifiability issues, whereby different models or parameters make the same predictions about the data and cannot be distinguished by the data. Previous studies of unidentifiability have focused on heuristic methods based on gene trees and do not make an efficient use of the information in the data. Here we study the unidentifiability of MSci models under the full-likelihood methods. We characterize the unidentifiability of the bidirectional introgression (BDI) model, which assumes that gene flow occurs in both directions. We derive simple rules for arbitrary BDI models, which create unidentifiability of the label-switching type. In general, an MSci model with k BDI events has [Formula: see text] unidentifiable modes or towers in the posterior, with each BDI event between sister species creating within-model parameter unidentifiability and each BDI event between nonsister species creating between-model unidentifiability. We develop novel algorithms for processing Markov chain Monte Carlo samples to remove label-switching problems and implement them in the bpp program. We analyze real and synthetic data to illustrate the utility of the BDI models and the new algorithms. We discuss the unidentifiability of heuristic methods and provide guidelines for the use of MSci models to infer gene flow using genomic data.
format Online
Article
Text
id pubmed-9087891
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-90878912022-05-11 Estimation of Cross-Species Introgression Rates Using Genomic Data Despite Model Unidentifiability Yang, Ziheng Flouri, Tomáš Mol Biol Evol Methods Full-likelihood implementations of the multispecies coalescent with introgression (MSci) model treat genealogical fluctuations across the genome as a major source of information to infer the history of species divergence and gene flow using multilocus sequence data. However, MSci models are known to have unidentifiability issues, whereby different models or parameters make the same predictions about the data and cannot be distinguished by the data. Previous studies of unidentifiability have focused on heuristic methods based on gene trees and do not make an efficient use of the information in the data. Here we study the unidentifiability of MSci models under the full-likelihood methods. We characterize the unidentifiability of the bidirectional introgression (BDI) model, which assumes that gene flow occurs in both directions. We derive simple rules for arbitrary BDI models, which create unidentifiability of the label-switching type. In general, an MSci model with k BDI events has [Formula: see text] unidentifiable modes or towers in the posterior, with each BDI event between sister species creating within-model parameter unidentifiability and each BDI event between nonsister species creating between-model unidentifiability. We develop novel algorithms for processing Markov chain Monte Carlo samples to remove label-switching problems and implement them in the bpp program. We analyze real and synthetic data to illustrate the utility of the BDI models and the new algorithms. We discuss the unidentifiability of heuristic methods and provide guidelines for the use of MSci models to infer gene flow using genomic data. Oxford University Press 2022-04-13 /pmc/articles/PMC9087891/ /pubmed/35417543 http://dx.doi.org/10.1093/molbev/msac083 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods
Yang, Ziheng
Flouri, Tomáš
Estimation of Cross-Species Introgression Rates Using Genomic Data Despite Model Unidentifiability
title Estimation of Cross-Species Introgression Rates Using Genomic Data Despite Model Unidentifiability
title_full Estimation of Cross-Species Introgression Rates Using Genomic Data Despite Model Unidentifiability
title_fullStr Estimation of Cross-Species Introgression Rates Using Genomic Data Despite Model Unidentifiability
title_full_unstemmed Estimation of Cross-Species Introgression Rates Using Genomic Data Despite Model Unidentifiability
title_short Estimation of Cross-Species Introgression Rates Using Genomic Data Despite Model Unidentifiability
title_sort estimation of cross-species introgression rates using genomic data despite model unidentifiability
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9087891/
https://www.ncbi.nlm.nih.gov/pubmed/35417543
http://dx.doi.org/10.1093/molbev/msac083
work_keys_str_mv AT yangziheng estimationofcrossspeciesintrogressionratesusinggenomicdatadespitemodelunidentifiability
AT flouritomas estimationofcrossspeciesintrogressionratesusinggenomicdatadespitemodelunidentifiability