Cargando…
Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images
Estimates of individual-level genomic ancestry are routinely used in human genetics, and related fields. The analysis of population structure and genomic ancestry can yield insights in terms of modern and ancient populations, allowing us to address questions regarding admixture, and the numbers and...
Autores principales: | , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7367291/ https://www.ncbi.nlm.nih.gov/pubmed/32678112 http://dx.doi.org/10.1038/s41598-020-68259-w |
_version_ | 1783560394697080832 |
---|---|
author | Li, Jiarui Zarzar, Tomás González White, Julie D. Indencleef, Karlijne Hoskens, Hanne Matthews, Harry Nauwelaers, Nele Zaidi, Arslan Eller, Ryan J. Herrick, Noah Günther, Torsten Svensson, Emma M. Jakobsson, Mattias Walsh, Susan Van Steen, Kristel Shriver, Mark D. Claes, Peter |
author_facet | Li, Jiarui Zarzar, Tomás González White, Julie D. Indencleef, Karlijne Hoskens, Hanne Matthews, Harry Nauwelaers, Nele Zaidi, Arslan Eller, Ryan J. Herrick, Noah Günther, Torsten Svensson, Emma M. Jakobsson, Mattias Walsh, Susan Van Steen, Kristel Shriver, Mark D. Claes, Peter |
author_sort | Li, Jiarui |
collection | PubMed |
description | Estimates of individual-level genomic ancestry are routinely used in human genetics, and related fields. The analysis of population structure and genomic ancestry can yield insights in terms of modern and ancient populations, allowing us to address questions regarding admixture, and the numbers and identities of the parental source populations. Unrecognized population structure is also an important confounder to correct for in genome-wide association studies. However, it remains challenging to work with heterogeneous datasets from multiple studies collected by different laboratories with diverse genotyping and imputation protocols. This work presents a new approach and an accompanying open-source toolbox that facilitates a robust integrative analysis for population structure and genomic ancestry estimates for heterogeneous datasets. We show robustness against individual outliers and different protocols for the projection of new samples into a reference ancestry space, and the ability to reveal and adjust for population structure in a simulated case–control admixed population. Given that visually evident and easily recognizable patterns of human facial characteristics co-vary with genomic ancestry, and based on the integration of three different sources of genome data, we generate average 3D faces to illustrate genomic ancestry variations within the 1,000 Genome project and for eight ancient-DNA profiles, respectively. |
format | Online Article Text |
id | pubmed-7367291 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-73672912020-07-20 Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images Li, Jiarui Zarzar, Tomás González White, Julie D. Indencleef, Karlijne Hoskens, Hanne Matthews, Harry Nauwelaers, Nele Zaidi, Arslan Eller, Ryan J. Herrick, Noah Günther, Torsten Svensson, Emma M. Jakobsson, Mattias Walsh, Susan Van Steen, Kristel Shriver, Mark D. Claes, Peter Sci Rep Article Estimates of individual-level genomic ancestry are routinely used in human genetics, and related fields. The analysis of population structure and genomic ancestry can yield insights in terms of modern and ancient populations, allowing us to address questions regarding admixture, and the numbers and identities of the parental source populations. Unrecognized population structure is also an important confounder to correct for in genome-wide association studies. However, it remains challenging to work with heterogeneous datasets from multiple studies collected by different laboratories with diverse genotyping and imputation protocols. This work presents a new approach and an accompanying open-source toolbox that facilitates a robust integrative analysis for population structure and genomic ancestry estimates for heterogeneous datasets. We show robustness against individual outliers and different protocols for the projection of new samples into a reference ancestry space, and the ability to reveal and adjust for population structure in a simulated case–control admixed population. Given that visually evident and easily recognizable patterns of human facial characteristics co-vary with genomic ancestry, and based on the integration of three different sources of genome data, we generate average 3D faces to illustrate genomic ancestry variations within the 1,000 Genome project and for eight ancient-DNA profiles, respectively. Nature Publishing Group UK 2020-07-16 /pmc/articles/PMC7367291/ /pubmed/32678112 http://dx.doi.org/10.1038/s41598-020-68259-w Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Li, Jiarui Zarzar, Tomás González White, Julie D. Indencleef, Karlijne Hoskens, Hanne Matthews, Harry Nauwelaers, Nele Zaidi, Arslan Eller, Ryan J. Herrick, Noah Günther, Torsten Svensson, Emma M. Jakobsson, Mattias Walsh, Susan Van Steen, Kristel Shriver, Mark D. Claes, Peter Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images |
title | Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images |
title_full | Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images |
title_fullStr | Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images |
title_full_unstemmed | Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images |
title_short | Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images |
title_sort | robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3d facial images |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7367291/ https://www.ncbi.nlm.nih.gov/pubmed/32678112 http://dx.doi.org/10.1038/s41598-020-68259-w |
work_keys_str_mv | AT lijiarui robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT zarzartomasgonzalez robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT whitejulied robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT indencleefkarlijne robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT hoskenshanne robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT matthewsharry robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT nauwelaersnele robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT zaidiarslan robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT ellerryanj robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT herricknoah robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT gunthertorsten robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT svenssonemmam robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT jakobssonmattias robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT walshsusan robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT vansteenkristel robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT shrivermarkd robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages AT claespeter robustgenomewideancestryinferenceforheterogeneousdatasetsillustratedusingthe1000genomeprojectwith3dfacialimages |