Cargando…

Evolutionary Sparse Learning for Phylogenomics

We introduce a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci—such as genes, proteins, genomic segments, and positions—as parameters. Using the Least Absolute Shrinkage and Selec...

Descripción completa

Detalles Bibliográficos
Autores principales: Kumar, Sudhir, Sharma, Sudip
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8557465/
https://www.ncbi.nlm.nih.gov/pubmed/34343318
http://dx.doi.org/10.1093/molbev/msab227
_version_ 1784592377088835584
author Kumar, Sudhir
Sharma, Sudip
author_facet Kumar, Sudhir
Sharma, Sudip
author_sort Kumar, Sudhir
collection PubMed
description We introduce a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci—such as genes, proteins, genomic segments, and positions—as parameters. Using the Least Absolute Shrinkage and Selection Operator, ESL selects only the most important genomic loci to explain a given phylogenetic hypothesis or presence/absence of a trait. ESL models do not directly involve conventional parameters such as rates of substitutions between nucleotides, rate variation among positions, and phylogeny branch lengths. Instead, ESL directly employs the concordance of variation across sequences in an alignment with the evolutionary hypothesis of interest. ESL provides a natural way to combine different molecular and nonmolecular data types and incorporate biological and functional annotations of genomic loci in model building. We propose positional, gene, function, and hypothesis sparsity scores, illustrate their use through an example, and suggest several applications of ESL. The ESL framework has the potential to drive the development of a new class of computational methods that will complement traditional approaches in evolutionary genomics, particularly for identifying influential loci and sequences given a phylogeny and building models to test hypotheses. ESL’s fast computational times and small memory footprint will also help democratize big data analytics and improve scientific rigor in phylogenomics.
format Online
Article
Text
id pubmed-8557465
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85574652021-11-01 Evolutionary Sparse Learning for Phylogenomics Kumar, Sudhir Sharma, Sudip Mol Biol Evol Perspectives We introduce a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci—such as genes, proteins, genomic segments, and positions—as parameters. Using the Least Absolute Shrinkage and Selection Operator, ESL selects only the most important genomic loci to explain a given phylogenetic hypothesis or presence/absence of a trait. ESL models do not directly involve conventional parameters such as rates of substitutions between nucleotides, rate variation among positions, and phylogeny branch lengths. Instead, ESL directly employs the concordance of variation across sequences in an alignment with the evolutionary hypothesis of interest. ESL provides a natural way to combine different molecular and nonmolecular data types and incorporate biological and functional annotations of genomic loci in model building. We propose positional, gene, function, and hypothesis sparsity scores, illustrate their use through an example, and suggest several applications of ESL. The ESL framework has the potential to drive the development of a new class of computational methods that will complement traditional approaches in evolutionary genomics, particularly for identifying influential loci and sequences given a phylogeny and building models to test hypotheses. ESL’s fast computational times and small memory footprint will also help democratize big data analytics and improve scientific rigor in phylogenomics. Oxford University Press 2021-08-03 /pmc/articles/PMC8557465/ /pubmed/34343318 http://dx.doi.org/10.1093/molbev/msab227 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Perspectives
Kumar, Sudhir
Sharma, Sudip
Evolutionary Sparse Learning for Phylogenomics
title Evolutionary Sparse Learning for Phylogenomics
title_full Evolutionary Sparse Learning for Phylogenomics
title_fullStr Evolutionary Sparse Learning for Phylogenomics
title_full_unstemmed Evolutionary Sparse Learning for Phylogenomics
title_short Evolutionary Sparse Learning for Phylogenomics
title_sort evolutionary sparse learning for phylogenomics
topic Perspectives
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8557465/
https://www.ncbi.nlm.nih.gov/pubmed/34343318
http://dx.doi.org/10.1093/molbev/msab227
work_keys_str_mv AT kumarsudhir evolutionarysparselearningforphylogenomics
AT sharmasudip evolutionarysparselearningforphylogenomics