Cargando…

Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics

Natural selection leaves a spatial pattern along the genome, with a haplotype distribution distortion near the selected locus that fades with distance. Evaluating the spatial signal of a population-genetic summary statistic across the genome allows for patterns of natural selection to be distinguish...

Descripción completa

Detalles Bibliográficos
Autores principales: Arnab, Sandipan Paul, Amin, Md Ruhul, DeGiorgio, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10365025/
https://www.ncbi.nlm.nih.gov/pubmed/37433019
http://dx.doi.org/10.1093/molbev/msad157
_version_ 1785076963939975168
author Arnab, Sandipan Paul
Amin, Md Ruhul
DeGiorgio, Michael
author_facet Arnab, Sandipan Paul
Amin, Md Ruhul
DeGiorgio, Michael
author_sort Arnab, Sandipan Paul
collection PubMed
description Natural selection leaves a spatial pattern along the genome, with a haplotype distribution distortion near the selected locus that fades with distance. Evaluating the spatial signal of a population-genetic summary statistic across the genome allows for patterns of natural selection to be distinguished from neutrality. Considering the genomic spatial distribution of multiple summary statistics is expected to aid in uncovering subtle signatures of selection. In recent years, numerous methods have been devised that consider genomic spatial distributions across summary statistics, utilizing both classical machine learning and deep learning architectures. However, better predictions may be attainable by improving the way in which features are extracted from these summary statistics. We apply wavelet transform, multitaper spectral analysis, and S-transform to summary statistic arrays to achieve this goal. Each analysis method converts one-dimensional summary statistic arrays to two-dimensional images of spectral analysis, allowing simultaneous temporal and spectral assessment. We feed these images into convolutional neural networks and consider combining models using ensemble stacking. Our modeling framework achieves high accuracy and power across a diverse set of evolutionary settings, including population size changes and test sets of varying sweep strength, softness, and timing. A scan of central European whole-genome sequences recapitulated well-established sweep candidates and predicted novel cancer-associated genes as sweeps with high support. Given that this modeling framework is also robust to missing genomic segments, we believe that it will represent a welcome addition to the population-genomic toolkit for learning about adaptive processes from genomic data.
format Online
Article
Text
id pubmed-10365025
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103650252023-07-25 Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics Arnab, Sandipan Paul Amin, Md Ruhul DeGiorgio, Michael Mol Biol Evol Methods Natural selection leaves a spatial pattern along the genome, with a haplotype distribution distortion near the selected locus that fades with distance. Evaluating the spatial signal of a population-genetic summary statistic across the genome allows for patterns of natural selection to be distinguished from neutrality. Considering the genomic spatial distribution of multiple summary statistics is expected to aid in uncovering subtle signatures of selection. In recent years, numerous methods have been devised that consider genomic spatial distributions across summary statistics, utilizing both classical machine learning and deep learning architectures. However, better predictions may be attainable by improving the way in which features are extracted from these summary statistics. We apply wavelet transform, multitaper spectral analysis, and S-transform to summary statistic arrays to achieve this goal. Each analysis method converts one-dimensional summary statistic arrays to two-dimensional images of spectral analysis, allowing simultaneous temporal and spectral assessment. We feed these images into convolutional neural networks and consider combining models using ensemble stacking. Our modeling framework achieves high accuracy and power across a diverse set of evolutionary settings, including population size changes and test sets of varying sweep strength, softness, and timing. A scan of central European whole-genome sequences recapitulated well-established sweep candidates and predicted novel cancer-associated genes as sweeps with high support. Given that this modeling framework is also robust to missing genomic segments, we believe that it will represent a welcome addition to the population-genomic toolkit for learning about adaptive processes from genomic data. Oxford University Press 2023-07-11 /pmc/articles/PMC10365025/ /pubmed/37433019 http://dx.doi.org/10.1093/molbev/msad157 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods
Arnab, Sandipan Paul
Amin, Md Ruhul
DeGiorgio, Michael
Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics
title Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics
title_full Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics
title_fullStr Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics
title_full_unstemmed Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics
title_short Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics
title_sort uncovering footprints of natural selection through spectral analysis of genomic summary statistics
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10365025/
https://www.ncbi.nlm.nih.gov/pubmed/37433019
http://dx.doi.org/10.1093/molbev/msad157
work_keys_str_mv AT arnabsandipanpaul uncoveringfootprintsofnaturalselectionthroughspectralanalysisofgenomicsummarystatistics
AT aminmdruhul uncoveringfootprintsofnaturalselectionthroughspectralanalysisofgenomicsummarystatistics
AT degiorgiomichael uncoveringfootprintsofnaturalselectionthroughspectralanalysisofgenomicsummarystatistics