Cargando…

Hybrid autoencoder with orthogonal latent space for robust population structure inference

Analysis of population structure and genomic ancestry remains an important topic in human genetics and bioinformatics. Commonly used methods require high-quality genotype data to ensure accurate inference. However, in practice, laboratory artifacts and outliers are often present in the data. Moreove...

Descripción completa

Detalles Bibliográficos
Autores principales: Yuan, Meng, Hoskens, Hanne, Goovaerts, Seppe, Herrick, Noah, Shriver, Mark D., Walsh, Susan, Claes, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9929087/
https://www.ncbi.nlm.nih.gov/pubmed/36788253
http://dx.doi.org/10.1038/s41598-023-28759-x
_version_ 1784888771224797184
author Yuan, Meng
Hoskens, Hanne
Goovaerts, Seppe
Herrick, Noah
Shriver, Mark D.
Walsh, Susan
Claes, Peter
author_facet Yuan, Meng
Hoskens, Hanne
Goovaerts, Seppe
Herrick, Noah
Shriver, Mark D.
Walsh, Susan
Claes, Peter
author_sort Yuan, Meng
collection PubMed
description Analysis of population structure and genomic ancestry remains an important topic in human genetics and bioinformatics. Commonly used methods require high-quality genotype data to ensure accurate inference. However, in practice, laboratory artifacts and outliers are often present in the data. Moreover, existing methods are typically affected by the presence of related individuals in the dataset. In this work, we propose a novel hybrid method, called SAE-IBS, which combines the strengths of traditional matrix decomposition-based (e.g., principal component analysis) and more recent neural network-based (e.g., autoencoders) solutions. Namely, it yields an orthogonal latent space enhancing dimensionality selection while learning non-linear transformations. The proposed approach achieves higher accuracy than existing methods for projecting poor quality target samples (genotyping errors and missing data) onto a reference ancestry space and generates a robust ancestry space in the presence of relatedness. We introduce a new approach and an accompanying open-source program for robust ancestry inference in the presence of missing data, genotyping errors, and relatedness. The obtained ancestry space allows for non-linear projections and exhibits orthogonality with clearly separable population groups.
format Online
Article
Text
id pubmed-9929087
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-99290872023-02-16 Hybrid autoencoder with orthogonal latent space for robust population structure inference Yuan, Meng Hoskens, Hanne Goovaerts, Seppe Herrick, Noah Shriver, Mark D. Walsh, Susan Claes, Peter Sci Rep Article Analysis of population structure and genomic ancestry remains an important topic in human genetics and bioinformatics. Commonly used methods require high-quality genotype data to ensure accurate inference. However, in practice, laboratory artifacts and outliers are often present in the data. Moreover, existing methods are typically affected by the presence of related individuals in the dataset. In this work, we propose a novel hybrid method, called SAE-IBS, which combines the strengths of traditional matrix decomposition-based (e.g., principal component analysis) and more recent neural network-based (e.g., autoencoders) solutions. Namely, it yields an orthogonal latent space enhancing dimensionality selection while learning non-linear transformations. The proposed approach achieves higher accuracy than existing methods for projecting poor quality target samples (genotyping errors and missing data) onto a reference ancestry space and generates a robust ancestry space in the presence of relatedness. We introduce a new approach and an accompanying open-source program for robust ancestry inference in the presence of missing data, genotyping errors, and relatedness. The obtained ancestry space allows for non-linear projections and exhibits orthogonality with clearly separable population groups. Nature Publishing Group UK 2023-02-14 /pmc/articles/PMC9929087/ /pubmed/36788253 http://dx.doi.org/10.1038/s41598-023-28759-x Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Yuan, Meng
Hoskens, Hanne
Goovaerts, Seppe
Herrick, Noah
Shriver, Mark D.
Walsh, Susan
Claes, Peter
Hybrid autoencoder with orthogonal latent space for robust population structure inference
title Hybrid autoencoder with orthogonal latent space for robust population structure inference
title_full Hybrid autoencoder with orthogonal latent space for robust population structure inference
title_fullStr Hybrid autoencoder with orthogonal latent space for robust population structure inference
title_full_unstemmed Hybrid autoencoder with orthogonal latent space for robust population structure inference
title_short Hybrid autoencoder with orthogonal latent space for robust population structure inference
title_sort hybrid autoencoder with orthogonal latent space for robust population structure inference
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9929087/
https://www.ncbi.nlm.nih.gov/pubmed/36788253
http://dx.doi.org/10.1038/s41598-023-28759-x
work_keys_str_mv AT yuanmeng hybridautoencoderwithorthogonallatentspaceforrobustpopulationstructureinference
AT hoskenshanne hybridautoencoderwithorthogonallatentspaceforrobustpopulationstructureinference
AT goovaertsseppe hybridautoencoderwithorthogonallatentspaceforrobustpopulationstructureinference
AT herricknoah hybridautoencoderwithorthogonallatentspaceforrobustpopulationstructureinference
AT shrivermarkd hybridautoencoderwithorthogonallatentspaceforrobustpopulationstructureinference
AT walshsusan hybridautoencoderwithorthogonallatentspaceforrobustpopulationstructureinference
AT claespeter hybridautoencoderwithorthogonallatentspaceforrobustpopulationstructureinference