Cargando…

An ancestry informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data

BACKGROUND: Europeans and American Indians were major genetic ancestry of Hispanics in the U.S. These ancestral groups have markedly different incidence rates and outcomes in many types of cancers. Therefore, the genetic admixture may cause biased genetic association study with cancer susceptibility...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Li-Ju, Zhang, Catherine W., Su, Sophia C., Chen, Hung-I H., Chiu, Yu-Chiao, Lai, Zhao, Bouamar, Hakim, Ramirez, Amelie G., Cigarroa, Francisco G., Sun, Lu-Zhe, Chen, Yidong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6936141/
https://www.ncbi.nlm.nih.gov/pubmed/31888480
http://dx.doi.org/10.1186/s12864-019-6333-6
_version_ 1783483691690885120
author Wang, Li-Ju
Zhang, Catherine W.
Su, Sophia C.
Chen, Hung-I H.
Chiu, Yu-Chiao
Lai, Zhao
Bouamar, Hakim
Ramirez, Amelie G.
Cigarroa, Francisco G.
Sun, Lu-Zhe
Chen, Yidong
author_facet Wang, Li-Ju
Zhang, Catherine W.
Su, Sophia C.
Chen, Hung-I H.
Chiu, Yu-Chiao
Lai, Zhao
Bouamar, Hakim
Ramirez, Amelie G.
Cigarroa, Francisco G.
Sun, Lu-Zhe
Chen, Yidong
author_sort Wang, Li-Ju
collection PubMed
description BACKGROUND: Europeans and American Indians were major genetic ancestry of Hispanics in the U.S. These ancestral groups have markedly different incidence rates and outcomes in many types of cancers. Therefore, the genetic admixture may cause biased genetic association study with cancer susceptibility variants specifically in Hispanics. For example, the incidence rate of liver cancer has been shown with substantial disparity between Hispanic, Asian and non-Hispanic white populations. Currently, ancestry informative marker (AIM) panels have been widely utilized with up to a few hundred ancestry-informative single nucleotide polymorphisms (SNPs) to infer ancestry admixture. Notably, current available AIMs are predominantly located in intron and intergenic regions, while the whole exome sequencing (WES) protocols commonly used in translational research and clinical practice do not cover these markers. Thus, it remains challenging to accurately determine a patient’s admixture proportion without additional DNA testing. RESULTS: In this study we designed an unique AIM panel that infers 3-way genetic admixture from three distinct and selective continental populations (African (AFR), European (EUR), and East Asian (EAS)) within evolutionarily conserved exonic regions. Initially, about 1 million exonic SNPs from selective three populations in the 1000 Genomes Project were trimmed by their linkage disequilibrium (LD), restricted to biallelic variants, and finally we optimized to an AIM panel with 250 SNP markers, or the UT-AIM250 panel, using their ancestral informativeness statistics. Comparing to published AIM panels, UT-AIM250 performed better accuracy when we tested with three ancestral populations (accuracy: 0.995 ± 0.012 for AFR, 0.997 ± 0.007 for EUR, and 0.994 ± 0.012 for EAS). We further demonstrated the performance of the UT-AIM250 panel to admixed American (AMR) samples of the 1000 Genomes Project and obtained similar results (AFR, 0.085 ± 0.098; EUR, 0.665 ± 0.182; and EAS, 0.250 ± 0.205) to previously published AIM panels (Phillips-AIM34: AFR, 0.096 ± 0.127, EUR, 0.575 ± 0.290, and EAS, 0.330 ± 0.315; Wei-AIM278: AFR, 0.070 ± 0.096, EUR, 0.537 ± 0.267, and EAS, 0.393 ± 0.300). Subsequently, we applied the UT-AIM250 panel to a clinical dataset of 26 self-reported Hispanic patients in South Texas with hepatocellular carcinoma (HCC). We estimated the admixture proportions using WES data of adjacent non-cancer liver tissues (AFR, 0.065 ± 0.043; EUR, 0.594 ± 0.150; and EAS, 0.341 ± 0.160). Similar admixture proportions were identified from corresponding tumor tissues. In addition, we estimated admixture proportions of The Cancer Genome Atlas (TCGA) collection of hepatocellular carcinoma (TCGA-LIHC) samples (376 patients) using the UT-AIM250 panel. The panel obtained consistent admixture proportions from tumor and matched normal tissues, identified 3 possible incorrectly reported race/ethnicity, and/or provided race/ethnicity determination if necessary. CONCLUSIONS: Here we demonstrated the feasibility of using evolutionarily conserved exonic regions to infer admixture proportions and provided a robust and reliable control for sample collection or patient stratification for genetic analysis. R implementation of UT-AIM250 is available at https://github.com/chenlabgccri/UT-AIM250.
format Online
Article
Text
id pubmed-6936141
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69361412019-12-31 An ancestry informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data Wang, Li-Ju Zhang, Catherine W. Su, Sophia C. Chen, Hung-I H. Chiu, Yu-Chiao Lai, Zhao Bouamar, Hakim Ramirez, Amelie G. Cigarroa, Francisco G. Sun, Lu-Zhe Chen, Yidong BMC Genomics Research BACKGROUND: Europeans and American Indians were major genetic ancestry of Hispanics in the U.S. These ancestral groups have markedly different incidence rates and outcomes in many types of cancers. Therefore, the genetic admixture may cause biased genetic association study with cancer susceptibility variants specifically in Hispanics. For example, the incidence rate of liver cancer has been shown with substantial disparity between Hispanic, Asian and non-Hispanic white populations. Currently, ancestry informative marker (AIM) panels have been widely utilized with up to a few hundred ancestry-informative single nucleotide polymorphisms (SNPs) to infer ancestry admixture. Notably, current available AIMs are predominantly located in intron and intergenic regions, while the whole exome sequencing (WES) protocols commonly used in translational research and clinical practice do not cover these markers. Thus, it remains challenging to accurately determine a patient’s admixture proportion without additional DNA testing. RESULTS: In this study we designed an unique AIM panel that infers 3-way genetic admixture from three distinct and selective continental populations (African (AFR), European (EUR), and East Asian (EAS)) within evolutionarily conserved exonic regions. Initially, about 1 million exonic SNPs from selective three populations in the 1000 Genomes Project were trimmed by their linkage disequilibrium (LD), restricted to biallelic variants, and finally we optimized to an AIM panel with 250 SNP markers, or the UT-AIM250 panel, using their ancestral informativeness statistics. Comparing to published AIM panels, UT-AIM250 performed better accuracy when we tested with three ancestral populations (accuracy: 0.995 ± 0.012 for AFR, 0.997 ± 0.007 for EUR, and 0.994 ± 0.012 for EAS). We further demonstrated the performance of the UT-AIM250 panel to admixed American (AMR) samples of the 1000 Genomes Project and obtained similar results (AFR, 0.085 ± 0.098; EUR, 0.665 ± 0.182; and EAS, 0.250 ± 0.205) to previously published AIM panels (Phillips-AIM34: AFR, 0.096 ± 0.127, EUR, 0.575 ± 0.290, and EAS, 0.330 ± 0.315; Wei-AIM278: AFR, 0.070 ± 0.096, EUR, 0.537 ± 0.267, and EAS, 0.393 ± 0.300). Subsequently, we applied the UT-AIM250 panel to a clinical dataset of 26 self-reported Hispanic patients in South Texas with hepatocellular carcinoma (HCC). We estimated the admixture proportions using WES data of adjacent non-cancer liver tissues (AFR, 0.065 ± 0.043; EUR, 0.594 ± 0.150; and EAS, 0.341 ± 0.160). Similar admixture proportions were identified from corresponding tumor tissues. In addition, we estimated admixture proportions of The Cancer Genome Atlas (TCGA) collection of hepatocellular carcinoma (TCGA-LIHC) samples (376 patients) using the UT-AIM250 panel. The panel obtained consistent admixture proportions from tumor and matched normal tissues, identified 3 possible incorrectly reported race/ethnicity, and/or provided race/ethnicity determination if necessary. CONCLUSIONS: Here we demonstrated the feasibility of using evolutionarily conserved exonic regions to infer admixture proportions and provided a robust and reliable control for sample collection or patient stratification for genetic analysis. R implementation of UT-AIM250 is available at https://github.com/chenlabgccri/UT-AIM250. BioMed Central 2019-12-30 /pmc/articles/PMC6936141/ /pubmed/31888480 http://dx.doi.org/10.1186/s12864-019-6333-6 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Wang, Li-Ju
Zhang, Catherine W.
Su, Sophia C.
Chen, Hung-I H.
Chiu, Yu-Chiao
Lai, Zhao
Bouamar, Hakim
Ramirez, Amelie G.
Cigarroa, Francisco G.
Sun, Lu-Zhe
Chen, Yidong
An ancestry informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data
title An ancestry informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data
title_full An ancestry informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data
title_fullStr An ancestry informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data
title_full_unstemmed An ancestry informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data
title_short An ancestry informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data
title_sort ancestry informative marker panel design for individual ancestry estimation of hispanic population using whole exome sequencing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6936141/
https://www.ncbi.nlm.nih.gov/pubmed/31888480
http://dx.doi.org/10.1186/s12864-019-6333-6
work_keys_str_mv AT wangliju anancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT zhangcatherinew anancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT susophiac anancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT chenhungih anancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT chiuyuchiao anancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT laizhao anancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT bouamarhakim anancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT ramirezamelieg anancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT cigarroafranciscog anancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT sunluzhe anancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT chenyidong anancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT wangliju ancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT zhangcatherinew ancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT susophiac ancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT chenhungih ancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT chiuyuchiao ancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT laizhao ancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT bouamarhakim ancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT ramirezamelieg ancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT cigarroafranciscog ancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT sunluzhe ancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata
AT chenyidong ancestryinformativemarkerpaneldesignforindividualancestryestimationofhispanicpopulationusingwholeexomesequencingdata