Cargando…

Evolutionary Sparse Learning for Phylogenomics

We introduce a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci—such as genes, proteins, genomic segments, and positions—as parameters. Using the Least Absolute Shrinkage and Selec...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kumar, Sudhir, Sharma, Sudip
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Perspectives
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8557465/ https://www.ncbi.nlm.nih.gov/pubmed/34343318 http://dx.doi.org/10.1093/molbev/msab227

_version_	1784592377088835584
author	Kumar, Sudhir Sharma, Sudip
author_facet	Kumar, Sudhir Sharma, Sudip
author_sort	Kumar, Sudhir
collection	PubMed
description	We introduce a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci—such as genes, proteins, genomic segments, and positions—as parameters. Using the Least Absolute Shrinkage and Selection Operator, ESL selects only the most important genomic loci to explain a given phylogenetic hypothesis or presence/absence of a trait. ESL models do not directly involve conventional parameters such as rates of substitutions between nucleotides, rate variation among positions, and phylogeny branch lengths. Instead, ESL directly employs the concordance of variation across sequences in an alignment with the evolutionary hypothesis of interest. ESL provides a natural way to combine different molecular and nonmolecular data types and incorporate biological and functional annotations of genomic loci in model building. We propose positional, gene, function, and hypothesis sparsity scores, illustrate their use through an example, and suggest several applications of ESL. The ESL framework has the potential to drive the development of a new class of computational methods that will complement traditional approaches in evolutionary genomics, particularly for identifying influential loci and sequences given a phylogeny and building models to test hypotheses. ESL’s fast computational times and small memory footprint will also help democratize big data analytics and improve scientific rigor in phylogenomics.
format	Online Article Text
id	pubmed-8557465
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-85574652021-11-01 Evolutionary Sparse Learning for Phylogenomics Kumar, Sudhir Sharma, Sudip Mol Biol Evol Perspectives We introduce a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci—such as genes, proteins, genomic segments, and positions—as parameters. Using the Least Absolute Shrinkage and Selection Operator, ESL selects only the most important genomic loci to explain a given phylogenetic hypothesis or presence/absence of a trait. ESL models do not directly involve conventional parameters such as rates of substitutions between nucleotides, rate variation among positions, and phylogeny branch lengths. Instead, ESL directly employs the concordance of variation across sequences in an alignment with the evolutionary hypothesis of interest. ESL provides a natural way to combine different molecular and nonmolecular data types and incorporate biological and functional annotations of genomic loci in model building. We propose positional, gene, function, and hypothesis sparsity scores, illustrate their use through an example, and suggest several applications of ESL. The ESL framework has the potential to drive the development of a new class of computational methods that will complement traditional approaches in evolutionary genomics, particularly for identifying influential loci and sequences given a phylogeny and building models to test hypotheses. ESL’s fast computational times and small memory footprint will also help democratize big data analytics and improve scientific rigor in phylogenomics. Oxford University Press 2021-08-03 /pmc/articles/PMC8557465/ /pubmed/34343318 http://dx.doi.org/10.1093/molbev/msab227 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Perspectives Kumar, Sudhir Sharma, Sudip Evolutionary Sparse Learning for Phylogenomics
title	Evolutionary Sparse Learning for Phylogenomics
title_full	Evolutionary Sparse Learning for Phylogenomics
title_fullStr	Evolutionary Sparse Learning for Phylogenomics
title_full_unstemmed	Evolutionary Sparse Learning for Phylogenomics
title_short	Evolutionary Sparse Learning for Phylogenomics
title_sort	evolutionary sparse learning for phylogenomics
topic	Perspectives
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8557465/ https://www.ncbi.nlm.nih.gov/pubmed/34343318 http://dx.doi.org/10.1093/molbev/msab227
work_keys_str_mv	AT kumarsudhir evolutionarysparselearningforphylogenomics AT sharmasudip evolutionarysparselearningforphylogenomics

Evolutionary Sparse Learning for Phylogenomics

Ejemplares similares