Cargando…
Evolutionary Sparse Learning for Phylogenomics
We introduce a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci—such as genes, proteins, genomic segments, and positions—as parameters. Using the Least Absolute Shrinkage and Selec...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8557465/ https://www.ncbi.nlm.nih.gov/pubmed/34343318 http://dx.doi.org/10.1093/molbev/msab227 |
_version_ | 1784592377088835584 |
---|---|
author | Kumar, Sudhir Sharma, Sudip |
author_facet | Kumar, Sudhir Sharma, Sudip |
author_sort | Kumar, Sudhir |
collection | PubMed |
description | We introduce a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci—such as genes, proteins, genomic segments, and positions—as parameters. Using the Least Absolute Shrinkage and Selection Operator, ESL selects only the most important genomic loci to explain a given phylogenetic hypothesis or presence/absence of a trait. ESL models do not directly involve conventional parameters such as rates of substitutions between nucleotides, rate variation among positions, and phylogeny branch lengths. Instead, ESL directly employs the concordance of variation across sequences in an alignment with the evolutionary hypothesis of interest. ESL provides a natural way to combine different molecular and nonmolecular data types and incorporate biological and functional annotations of genomic loci in model building. We propose positional, gene, function, and hypothesis sparsity scores, illustrate their use through an example, and suggest several applications of ESL. The ESL framework has the potential to drive the development of a new class of computational methods that will complement traditional approaches in evolutionary genomics, particularly for identifying influential loci and sequences given a phylogeny and building models to test hypotheses. ESL’s fast computational times and small memory footprint will also help democratize big data analytics and improve scientific rigor in phylogenomics. |
format | Online Article Text |
id | pubmed-8557465 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-85574652021-11-01 Evolutionary Sparse Learning for Phylogenomics Kumar, Sudhir Sharma, Sudip Mol Biol Evol Perspectives We introduce a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci—such as genes, proteins, genomic segments, and positions—as parameters. Using the Least Absolute Shrinkage and Selection Operator, ESL selects only the most important genomic loci to explain a given phylogenetic hypothesis or presence/absence of a trait. ESL models do not directly involve conventional parameters such as rates of substitutions between nucleotides, rate variation among positions, and phylogeny branch lengths. Instead, ESL directly employs the concordance of variation across sequences in an alignment with the evolutionary hypothesis of interest. ESL provides a natural way to combine different molecular and nonmolecular data types and incorporate biological and functional annotations of genomic loci in model building. We propose positional, gene, function, and hypothesis sparsity scores, illustrate their use through an example, and suggest several applications of ESL. The ESL framework has the potential to drive the development of a new class of computational methods that will complement traditional approaches in evolutionary genomics, particularly for identifying influential loci and sequences given a phylogeny and building models to test hypotheses. ESL’s fast computational times and small memory footprint will also help democratize big data analytics and improve scientific rigor in phylogenomics. Oxford University Press 2021-08-03 /pmc/articles/PMC8557465/ /pubmed/34343318 http://dx.doi.org/10.1093/molbev/msab227 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Perspectives Kumar, Sudhir Sharma, Sudip Evolutionary Sparse Learning for Phylogenomics |
title | Evolutionary Sparse Learning for Phylogenomics |
title_full | Evolutionary Sparse Learning for Phylogenomics |
title_fullStr | Evolutionary Sparse Learning for Phylogenomics |
title_full_unstemmed | Evolutionary Sparse Learning for Phylogenomics |
title_short | Evolutionary Sparse Learning for Phylogenomics |
title_sort | evolutionary sparse learning for phylogenomics |
topic | Perspectives |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8557465/ https://www.ncbi.nlm.nih.gov/pubmed/34343318 http://dx.doi.org/10.1093/molbev/msab227 |
work_keys_str_mv | AT kumarsudhir evolutionarysparselearningforphylogenomics AT sharmasudip evolutionarysparselearningforphylogenomics |