Cargando…

HaploBlocks: Efficient Detection of Positive Selection in Large Population Genomic Datasets

Genomic regions under positive selection harbor variation linked for example to adaptation. Most tools for detecting positively selected variants have computational resource requirements rendering them impractical on population genomic datasets with hundreds of thousands of individuals or more. We h...

Descripción completa

Detalles Bibliográficos
Autores principales: Kirsch-Gerweck, Benedikt, Bohnenkämper, Leonard, Henrichs, Michel T, Alanko, Jarno N, Bannai, Hideo, Cazaux, Bastien, Peterlongo, Pierre, Burger, Joachim, Stoye, Jens, Diekmann, Yoan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985328/
https://www.ncbi.nlm.nih.gov/pubmed/36790822
http://dx.doi.org/10.1093/molbev/msad027
_version_ 1784900929210810368
author Kirsch-Gerweck, Benedikt
Bohnenkämper, Leonard
Henrichs, Michel T
Alanko, Jarno N
Bannai, Hideo
Cazaux, Bastien
Peterlongo, Pierre
Burger, Joachim
Stoye, Jens
Diekmann, Yoan
author_facet Kirsch-Gerweck, Benedikt
Bohnenkämper, Leonard
Henrichs, Michel T
Alanko, Jarno N
Bannai, Hideo
Cazaux, Bastien
Peterlongo, Pierre
Burger, Joachim
Stoye, Jens
Diekmann, Yoan
author_sort Kirsch-Gerweck, Benedikt
collection PubMed
description Genomic regions under positive selection harbor variation linked for example to adaptation. Most tools for detecting positively selected variants have computational resource requirements rendering them impractical on population genomic datasets with hundreds of thousands of individuals or more. We have developed and implemented an efficient haplotype-based approach able to scan large datasets and accurately detect positive selection. We achieve this by combining a pattern matching approach based on the positional Burrows–Wheeler transform with model-based inference which only requires the evaluation of closed-form expressions. We evaluate our approach with simulations, and find it to be both sensitive and specific. The computational resource requirements quantified using UK Biobank data indicate that our implementation is scalable to population genomic datasets with millions of individuals. Our approach may serve as an algorithmic blueprint for the era of “big data” genomics: a combinatorial core coupled with statistical inference in closed form.
format Online
Article
Text
id pubmed-9985328
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-99853282023-03-05 HaploBlocks: Efficient Detection of Positive Selection in Large Population Genomic Datasets Kirsch-Gerweck, Benedikt Bohnenkämper, Leonard Henrichs, Michel T Alanko, Jarno N Bannai, Hideo Cazaux, Bastien Peterlongo, Pierre Burger, Joachim Stoye, Jens Diekmann, Yoan Mol Biol Evol Methods Genomic regions under positive selection harbor variation linked for example to adaptation. Most tools for detecting positively selected variants have computational resource requirements rendering them impractical on population genomic datasets with hundreds of thousands of individuals or more. We have developed and implemented an efficient haplotype-based approach able to scan large datasets and accurately detect positive selection. We achieve this by combining a pattern matching approach based on the positional Burrows–Wheeler transform with model-based inference which only requires the evaluation of closed-form expressions. We evaluate our approach with simulations, and find it to be both sensitive and specific. The computational resource requirements quantified using UK Biobank data indicate that our implementation is scalable to population genomic datasets with millions of individuals. Our approach may serve as an algorithmic blueprint for the era of “big data” genomics: a combinatorial core coupled with statistical inference in closed form. Oxford University Press 2023-02-15 /pmc/articles/PMC9985328/ /pubmed/36790822 http://dx.doi.org/10.1093/molbev/msad027 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods
Kirsch-Gerweck, Benedikt
Bohnenkämper, Leonard
Henrichs, Michel T
Alanko, Jarno N
Bannai, Hideo
Cazaux, Bastien
Peterlongo, Pierre
Burger, Joachim
Stoye, Jens
Diekmann, Yoan
HaploBlocks: Efficient Detection of Positive Selection in Large Population Genomic Datasets
title HaploBlocks: Efficient Detection of Positive Selection in Large Population Genomic Datasets
title_full HaploBlocks: Efficient Detection of Positive Selection in Large Population Genomic Datasets
title_fullStr HaploBlocks: Efficient Detection of Positive Selection in Large Population Genomic Datasets
title_full_unstemmed HaploBlocks: Efficient Detection of Positive Selection in Large Population Genomic Datasets
title_short HaploBlocks: Efficient Detection of Positive Selection in Large Population Genomic Datasets
title_sort haploblocks: efficient detection of positive selection in large population genomic datasets
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985328/
https://www.ncbi.nlm.nih.gov/pubmed/36790822
http://dx.doi.org/10.1093/molbev/msad027
work_keys_str_mv AT kirschgerweckbenedikt haploblocksefficientdetectionofpositiveselectioninlargepopulationgenomicdatasets
AT bohnenkamperleonard haploblocksefficientdetectionofpositiveselectioninlargepopulationgenomicdatasets
AT henrichsmichelt haploblocksefficientdetectionofpositiveselectioninlargepopulationgenomicdatasets
AT alankojarnon haploblocksefficientdetectionofpositiveselectioninlargepopulationgenomicdatasets
AT bannaihideo haploblocksefficientdetectionofpositiveselectioninlargepopulationgenomicdatasets
AT cazauxbastien haploblocksefficientdetectionofpositiveselectioninlargepopulationgenomicdatasets
AT peterlongopierre haploblocksefficientdetectionofpositiveselectioninlargepopulationgenomicdatasets
AT burgerjoachim haploblocksefficientdetectionofpositiveselectioninlargepopulationgenomicdatasets
AT stoyejens haploblocksefficientdetectionofpositiveselectioninlargepopulationgenomicdatasets
AT diekmannyoan haploblocksefficientdetectionofpositiveselectioninlargepopulationgenomicdatasets