Cargando…

Haplotype and population structure inference using neural networks in whole-genome sequencing data

Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-gen...

Descripción completa

Detalles Bibliográficos
Autores principales: Meisner, Jonas, Albrechtsen, Anders
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9435741/
https://www.ncbi.nlm.nih.gov/pubmed/35794006
http://dx.doi.org/10.1101/gr.276813.122
_version_ 1784781216194494464
author Meisner, Jonas
Albrechtsen, Anders
author_facet Meisner, Jonas
Albrechtsen, Anders
author_sort Meisner, Jonas
collection PubMed
description Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By using Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We show that we can use haplotype clusters in the latent space to infer global population structure using haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we show that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.
format Online
Article
Text
id pubmed-9435741
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-94357412023-02-01 Haplotype and population structure inference using neural networks in whole-genome sequencing data Meisner, Jonas Albrechtsen, Anders Genome Res Method Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By using Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We show that we can use haplotype clusters in the latent space to infer global population structure using haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we show that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank. Cold Spring Harbor Laboratory Press 2022-08 /pmc/articles/PMC9435741/ /pubmed/35794006 http://dx.doi.org/10.1101/gr.276813.122 Text en © 2022 Meisner and Albrechtsen; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by-nc/4.0/This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Method
Meisner, Jonas
Albrechtsen, Anders
Haplotype and population structure inference using neural networks in whole-genome sequencing data
title Haplotype and population structure inference using neural networks in whole-genome sequencing data
title_full Haplotype and population structure inference using neural networks in whole-genome sequencing data
title_fullStr Haplotype and population structure inference using neural networks in whole-genome sequencing data
title_full_unstemmed Haplotype and population structure inference using neural networks in whole-genome sequencing data
title_short Haplotype and population structure inference using neural networks in whole-genome sequencing data
title_sort haplotype and population structure inference using neural networks in whole-genome sequencing data
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9435741/
https://www.ncbi.nlm.nih.gov/pubmed/35794006
http://dx.doi.org/10.1101/gr.276813.122
work_keys_str_mv AT meisnerjonas haplotypeandpopulationstructureinferenceusingneuralnetworksinwholegenomesequencingdata
AT albrechtsenanders haplotypeandpopulationstructureinferenceusingneuralnetworksinwholegenomesequencingdata