Cargando…

Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images

Estimates of individual-level genomic ancestry are routinely used in human genetics, and related fields. The analysis of population structure and genomic ancestry can yield insights in terms of modern and ancient populations, allowing us to address questions regarding admixture, and the numbers and...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Jiarui, Zarzar, Tomás González, White, Julie D., Indencleef, Karlijne, Hoskens, Hanne, Matthews, Harry, Nauwelaers, Nele, Zaidi, Arslan, Eller, Ryan J., Herrick, Noah, Günther, Torsten, Svensson, Emma M., Jakobsson, Mattias, Walsh, Susan, Van Steen, Kristel, Shriver, Mark D., Claes, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7367291/
https://www.ncbi.nlm.nih.gov/pubmed/32678112
http://dx.doi.org/10.1038/s41598-020-68259-w
_version_ 1783560394697080832
author Li, Jiarui
Zarzar, Tomás González
White, Julie D.
Indencleef, Karlijne
Hoskens, Hanne
Matthews, Harry
Nauwelaers, Nele
Zaidi, Arslan
Eller, Ryan J.
Herrick, Noah
Günther, Torsten
Svensson, Emma M.
Jakobsson, Mattias
Walsh, Susan
Van Steen, Kristel
Shriver, Mark D.
Claes, Peter
author_facet Li, Jiarui
Zarzar, Tomás González
White, Julie D.
Indencleef, Karlijne
Hoskens, Hanne
Matthews, Harry
Nauwelaers, Nele
Zaidi, Arslan
Eller, Ryan J.
Herrick, Noah
Günther, Torsten
Svensson, Emma M.
Jakobsson, Mattias
Walsh, Susan
Van Steen, Kristel
Shriver, Mark D.
Claes, Peter
author_sort Li, Jiarui
collection PubMed
description Estimates of individual-level genomic ancestry are routinely used in human genetics, and related fields. The analysis of population structure and genomic ancestry can yield insights in terms of modern and ancient populations, allowing us to address questions regarding admixture, and the numbers and identities of the parental source populations. Unrecognized population structure is also an important confounder to correct for in genome-wide association studies. However, it remains challenging to work with heterogeneous datasets from multiple studies collected by different laboratories with diverse genotyping and imputation protocols. This work presents a new approach and an accompanying open-source toolbox that facilitates a robust integrative analysis for population structure and genomic ancestry estimates for heterogeneous datasets. We show robustness against individual outliers and different protocols for the projection of new samples into a reference ancestry space, and the ability to reveal and adjust for population structure in a simulated case–control admixed population. Given that visually evident and easily recognizable patterns of human facial characteristics co-vary with genomic ancestry, and based on the integration of three different sources of genome data, we generate average 3D faces to illustrate genomic ancestry variations within the 1,000 Genome project and for eight ancient-DNA profiles, respectively.
format Online
Article
Text
id pubmed-7367291
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-73672912020-07-20 Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images Li, Jiarui Zarzar, Tomás González White, Julie D. Indencleef, Karlijne Hoskens, Hanne Matthews, Harry Nauwelaers, Nele Zaidi, Arslan Eller, Ryan J. Herrick, Noah Günther, Torsten Svensson, Emma M. Jakobsson, Mattias Walsh, Susan Van Steen, Kristel Shriver, Mark D. Claes, Peter Sci Rep Article Estimates of individual-level genomic ancestry are routinely used in human genetics, and related fields. The analysis of population structure and genomic ancestry can yield insights in terms of modern and ancient populations, allowing us to address questions regarding admixture, and the numbers and identities of the parental source populations. Unrecognized population structure is also an important confounder to correct for in genome-wide association studies. However, it remains challenging to work with heterogeneous datasets from multiple studies collected by different laboratories with diverse genotyping and imputation protocols. This work presents a new approach and an accompanying open-source toolbox that facilitates a robust integrative analysis for population structure and genomic ancestry estimates for heterogeneous datasets. We show robustness against individual outliers and different protocols for the projection of new samples into a reference ancestry space, and the ability to reveal and adjust for population structure in a simulated case–control admixed population. Given that visually evident and easily recognizable patterns of human facial characteristics co-vary with genomic ancestry, and based on the integration of three different sources of genome data, we generate average 3D faces to illustrate genomic ancestry variations within the 1,000 Genome project and for eight ancient-DNA profiles, respectively. Nature Publishing Group UK 2020-07-16 /pmc/articles/PMC7367291/ /pubmed/32678112 http://dx.doi.org/10.1038/s41598-020-68259-w Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Li, Jiarui
Zarzar, Tomás González
White, Julie D.
Indencleef, Karlijne
Hoskens, Hanne
Matthews, Harry
Nauwelaers, Nele
Zaidi, Arslan
Eller, Ryan J.
Herrick, Noah
Günther, Torsten
Svensson, Emma M.
Jakobsson, Mattias
Walsh, Susan
Van Steen, Kristel
Shriver, Mark D.
Claes, Peter
Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images
title Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images
title_full Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images
title_fullStr Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images
title_full_unstemmed Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images
title_short Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images
title_sort robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3d facial images
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7367291/
https://www.ncbi.nlm.nih.gov/pubmed/32678112
http://dx.doi.org/10.1038/s41598-020-68259-w
work_keys_str_mv AT lijiarui robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT zarzartomasgonzalez robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT whitejulied robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT indencleefkarlijne robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT hoskenshanne robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT matthewsharry robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT nauwelaersnele robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT zaidiarslan robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT ellerryanj robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT herricknoah robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT gunthertorsten robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT svenssonemmam robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT jakobssonmattias robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT walshsusan robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT vansteenkristel robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT shrivermarkd robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages
AT claespeter robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages