Cargando…

Evaluating and sharing global genetic ancestry in biomedical datasets

Genetic ancestry is a critical co-factor to study phenotype-genotype associations using cohorts of human subjects. Most publicly available molecular datasets are, however, missing this information or only share self-reported race and ethnicity, representing a limitation to identify and repurpose dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Harismendy, Olivier, Kim, Jihoon, Xu, Xiaojun, Ohno-Machado, Lucila
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6433181/
https://www.ncbi.nlm.nih.gov/pubmed/30869786
http://dx.doi.org/10.1093/jamia/ocy194
_version_ 1783406254699315200
author Harismendy, Olivier
Kim, Jihoon
Xu, Xiaojun
Ohno-Machado, Lucila
author_facet Harismendy, Olivier
Kim, Jihoon
Xu, Xiaojun
Ohno-Machado, Lucila
author_sort Harismendy, Olivier
collection PubMed
description Genetic ancestry is a critical co-factor to study phenotype-genotype associations using cohorts of human subjects. Most publicly available molecular datasets are, however, missing this information or only share self-reported race and ethnicity, representing a limitation to identify and repurpose datasets to investigate the contribution of ancestry to diseases and traits. We propose an analytical framework to enrich the metadata from publicly available cohorts with genetic ancestry information and a resulting diversity score at continental resolution, calculated directly from the data. We illustrate this framework using The Cancer Genome Atlas datasets searched through the DataMed Data Discovery Index. Data repositories and contributors can use this framework to provide genetic diversity measurements for controlled access datasets, minimizing the work involved in requesting a dataset that may ultimately prove inadequate for a researcher’s purpose. With the increasing global scale of human genetics research, studies on disease risk and susceptibility would benefit greatly from the adequate estimation and sharing of genetic diversity in publicly available datasets following a framework such as the one presented.
format Online
Article
Text
id pubmed-6433181
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64331812019-03-28 Evaluating and sharing global genetic ancestry in biomedical datasets Harismendy, Olivier Kim, Jihoon Xu, Xiaojun Ohno-Machado, Lucila J Am Med Inform Assoc Brief Communication Genetic ancestry is a critical co-factor to study phenotype-genotype associations using cohorts of human subjects. Most publicly available molecular datasets are, however, missing this information or only share self-reported race and ethnicity, representing a limitation to identify and repurpose datasets to investigate the contribution of ancestry to diseases and traits. We propose an analytical framework to enrich the metadata from publicly available cohorts with genetic ancestry information and a resulting diversity score at continental resolution, calculated directly from the data. We illustrate this framework using The Cancer Genome Atlas datasets searched through the DataMed Data Discovery Index. Data repositories and contributors can use this framework to provide genetic diversity measurements for controlled access datasets, minimizing the work involved in requesting a dataset that may ultimately prove inadequate for a researcher’s purpose. With the increasing global scale of human genetics research, studies on disease risk and susceptibility would benefit greatly from the adequate estimation and sharing of genetic diversity in publicly available datasets following a framework such as the one presented. Oxford University Press 2019-03-14 /pmc/articles/PMC6433181/ /pubmed/30869786 http://dx.doi.org/10.1093/jamia/ocy194 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Brief Communication
Harismendy, Olivier
Kim, Jihoon
Xu, Xiaojun
Ohno-Machado, Lucila
Evaluating and sharing global genetic ancestry in biomedical datasets
title Evaluating and sharing global genetic ancestry in biomedical datasets
title_full Evaluating and sharing global genetic ancestry in biomedical datasets
title_fullStr Evaluating and sharing global genetic ancestry in biomedical datasets
title_full_unstemmed Evaluating and sharing global genetic ancestry in biomedical datasets
title_short Evaluating and sharing global genetic ancestry in biomedical datasets
title_sort evaluating and sharing global genetic ancestry in biomedical datasets
topic Brief Communication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6433181/
https://www.ncbi.nlm.nih.gov/pubmed/30869786
http://dx.doi.org/10.1093/jamia/ocy194
work_keys_str_mv AT harismendyolivier evaluatingandsharingglobalgeneticancestryinbiomedicaldatasets
AT kimjihoon evaluatingandsharingglobalgeneticancestryinbiomedicaldatasets
AT xuxiaojun evaluatingandsharingglobalgeneticancestryinbiomedicaldatasets
AT ohnomachadolucila evaluatingandsharingglobalgeneticancestryinbiomedicaldatasets