Cargando…

Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data

Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood...

Descripción completa

Detalles Bibliográficos
Autores principales: Amin, Md Ruhul, Hasan, Mahmudul, Arnab, Sandipan Paul, DeGiorgio, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10581699/
https://www.ncbi.nlm.nih.gov/pubmed/37772983
http://dx.doi.org/10.1093/molbev/msad216
_version_ 1785122190596767744
author Amin, Md Ruhul
Hasan, Mahmudul
Arnab, Sandipan Paul
DeGiorgio, Michael
author_facet Amin, Md Ruhul
Hasan, Mahmudul
Arnab, Sandipan Paul
DeGiorgio, Michael
author_sort Amin, Md Ruhul
collection PubMed
description Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood methods. However, such techniques are grounded in simple patterns or theoretical models that limit the complexity of settings they can explore. Due to the renaissance in artificial intelligence, machine learning methods have taken center stage in recent efforts to detect natural selection, with strategies such as convolutional neural networks applied to images of haplotypes. Yet, limitations of such techniques include estimation of large numbers of model parameters under nonconvex settings and feature identification without regard to location within an image. An alternative approach is to use tensor decomposition to extract features from multidimensional data although preserving the latent structure of the data, and to feed these features to machine learning models. Here, we adopt this framework and present a novel approach termed T-REx, which extracts features from images of haplotypes across sampled individuals using tensor decomposition, and then makes predictions from these features using classical machine learning methods. As a proof of concept, we explore the performance of T-REx on simulated neutral and selective sweep scenarios and find that it has high power and accuracy to discriminate sweeps from neutrality, robustness to common technical hurdles, and easy visualization of feature importance. Therefore, T-REx is a powerful addition to the toolkit for detecting adaptive processes from genomic data.
format Online
Article
Text
id pubmed-10581699
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105816992023-10-18 Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data Amin, Md Ruhul Hasan, Mahmudul Arnab, Sandipan Paul DeGiorgio, Michael Mol Biol Evol Methods Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood methods. However, such techniques are grounded in simple patterns or theoretical models that limit the complexity of settings they can explore. Due to the renaissance in artificial intelligence, machine learning methods have taken center stage in recent efforts to detect natural selection, with strategies such as convolutional neural networks applied to images of haplotypes. Yet, limitations of such techniques include estimation of large numbers of model parameters under nonconvex settings and feature identification without regard to location within an image. An alternative approach is to use tensor decomposition to extract features from multidimensional data although preserving the latent structure of the data, and to feed these features to machine learning models. Here, we adopt this framework and present a novel approach termed T-REx, which extracts features from images of haplotypes across sampled individuals using tensor decomposition, and then makes predictions from these features using classical machine learning methods. As a proof of concept, we explore the performance of T-REx on simulated neutral and selective sweep scenarios and find that it has high power and accuracy to discriminate sweeps from neutrality, robustness to common technical hurdles, and easy visualization of feature importance. Therefore, T-REx is a powerful addition to the toolkit for detecting adaptive processes from genomic data. Oxford University Press 2023-09-29 /pmc/articles/PMC10581699/ /pubmed/37772983 http://dx.doi.org/10.1093/molbev/msad216 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods
Amin, Md Ruhul
Hasan, Mahmudul
Arnab, Sandipan Paul
DeGiorgio, Michael
Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data
title Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data
title_full Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data
title_fullStr Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data
title_full_unstemmed Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data
title_short Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data
title_sort tensor decomposition-based feature extraction and classification to detect natural selection from genomic data
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10581699/
https://www.ncbi.nlm.nih.gov/pubmed/37772983
http://dx.doi.org/10.1093/molbev/msad216
work_keys_str_mv AT aminmdruhul tensordecompositionbasedfeatureextractionandclassificationtodetectnaturalselectionfromgenomicdata
AT hasanmahmudul tensordecompositionbasedfeatureextractionandclassificationtodetectnaturalselectionfromgenomicdata
AT arnabsandipanpaul tensordecompositionbasedfeatureextractionandclassificationtodetectnaturalselectionfromgenomicdata
AT degiorgiomichael tensordecompositionbasedfeatureextractionandclassificationtodetectnaturalselectionfromgenomicdata