Cargando…
Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics
Natural selection leaves a spatial pattern along the genome, with a haplotype distribution distortion near the selected locus that fades with distance. Evaluating the spatial signal of a population-genetic summary statistic across the genome allows for patterns of natural selection to be distinguish...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10365025/ https://www.ncbi.nlm.nih.gov/pubmed/37433019 http://dx.doi.org/10.1093/molbev/msad157 |
_version_ | 1785076963939975168 |
---|---|
author | Arnab, Sandipan Paul Amin, Md Ruhul DeGiorgio, Michael |
author_facet | Arnab, Sandipan Paul Amin, Md Ruhul DeGiorgio, Michael |
author_sort | Arnab, Sandipan Paul |
collection | PubMed |
description | Natural selection leaves a spatial pattern along the genome, with a haplotype distribution distortion near the selected locus that fades with distance. Evaluating the spatial signal of a population-genetic summary statistic across the genome allows for patterns of natural selection to be distinguished from neutrality. Considering the genomic spatial distribution of multiple summary statistics is expected to aid in uncovering subtle signatures of selection. In recent years, numerous methods have been devised that consider genomic spatial distributions across summary statistics, utilizing both classical machine learning and deep learning architectures. However, better predictions may be attainable by improving the way in which features are extracted from these summary statistics. We apply wavelet transform, multitaper spectral analysis, and S-transform to summary statistic arrays to achieve this goal. Each analysis method converts one-dimensional summary statistic arrays to two-dimensional images of spectral analysis, allowing simultaneous temporal and spectral assessment. We feed these images into convolutional neural networks and consider combining models using ensemble stacking. Our modeling framework achieves high accuracy and power across a diverse set of evolutionary settings, including population size changes and test sets of varying sweep strength, softness, and timing. A scan of central European whole-genome sequences recapitulated well-established sweep candidates and predicted novel cancer-associated genes as sweeps with high support. Given that this modeling framework is also robust to missing genomic segments, we believe that it will represent a welcome addition to the population-genomic toolkit for learning about adaptive processes from genomic data. |
format | Online Article Text |
id | pubmed-10365025 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-103650252023-07-25 Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics Arnab, Sandipan Paul Amin, Md Ruhul DeGiorgio, Michael Mol Biol Evol Methods Natural selection leaves a spatial pattern along the genome, with a haplotype distribution distortion near the selected locus that fades with distance. Evaluating the spatial signal of a population-genetic summary statistic across the genome allows for patterns of natural selection to be distinguished from neutrality. Considering the genomic spatial distribution of multiple summary statistics is expected to aid in uncovering subtle signatures of selection. In recent years, numerous methods have been devised that consider genomic spatial distributions across summary statistics, utilizing both classical machine learning and deep learning architectures. However, better predictions may be attainable by improving the way in which features are extracted from these summary statistics. We apply wavelet transform, multitaper spectral analysis, and S-transform to summary statistic arrays to achieve this goal. Each analysis method converts one-dimensional summary statistic arrays to two-dimensional images of spectral analysis, allowing simultaneous temporal and spectral assessment. We feed these images into convolutional neural networks and consider combining models using ensemble stacking. Our modeling framework achieves high accuracy and power across a diverse set of evolutionary settings, including population size changes and test sets of varying sweep strength, softness, and timing. A scan of central European whole-genome sequences recapitulated well-established sweep candidates and predicted novel cancer-associated genes as sweeps with high support. Given that this modeling framework is also robust to missing genomic segments, we believe that it will represent a welcome addition to the population-genomic toolkit for learning about adaptive processes from genomic data. Oxford University Press 2023-07-11 /pmc/articles/PMC10365025/ /pubmed/37433019 http://dx.doi.org/10.1093/molbev/msad157 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Arnab, Sandipan Paul Amin, Md Ruhul DeGiorgio, Michael Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics |
title | Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics |
title_full | Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics |
title_fullStr | Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics |
title_full_unstemmed | Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics |
title_short | Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics |
title_sort | uncovering footprints of natural selection through spectral analysis of genomic summary statistics |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10365025/ https://www.ncbi.nlm.nih.gov/pubmed/37433019 http://dx.doi.org/10.1093/molbev/msad157 |
work_keys_str_mv | AT arnabsandipanpaul uncoveringfootprintsofnaturalselectionthroughspectralanalysisofgenomicsummarystatistics AT aminmdruhul uncoveringfootprintsofnaturalselectionthroughspectralanalysisofgenomicsummarystatistics AT degiorgiomichael uncoveringfootprintsofnaturalselectionthroughspectralanalysisofgenomicsummarystatistics |