Cargando…

Genotype Imputation Reference Panel Selection Using Maximal Phylogenetic Diversity

The recent dramatic cost reduction of next-generation sequencing technology enables investigators to assess most variants in the human genome to identify risk variants for complex diseases. However, sequencing large samples remains very expensive. For a study sample with existing genotype data, such...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Peng, Zhan, Xiaowei, Rosenberg, Noah A., Zöllner, Sebastian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3781962/
https://www.ncbi.nlm.nih.gov/pubmed/23934887
http://dx.doi.org/10.1534/genetics.113.154591
_version_ 1782285501664854016
author Zhang, Peng
Zhan, Xiaowei
Rosenberg, Noah A.
Zöllner, Sebastian
author_facet Zhang, Peng
Zhan, Xiaowei
Rosenberg, Noah A.
Zöllner, Sebastian
author_sort Zhang, Peng
collection PubMed
description The recent dramatic cost reduction of next-generation sequencing technology enables investigators to assess most variants in the human genome to identify risk variants for complex diseases. However, sequencing large samples remains very expensive. For a study sample with existing genotype data, such as array data from genome-wide association studies, a cost-effective approach is to sequence a subset of the study sample and then to impute the rest of the study sample, using the sequenced subset as a reference panel. The use of such an internal reference panel identifies population-specific variants and avoids the problem of a substantial mismatch in ancestry background between the study population and the reference population. To efficiently select an internal panel, we introduce an idea of phylogenetic diversity from mathematical phylogenetics and comparative genomics. We propose the “most diverse reference panel”, defined as the subset with the maximal “phylogenetic diversity”, thereby incorporating individuals that span a diverse range of genotypes within the sample. Using data both from simulations and from the 1000 Genomes Project, we show that the most diverse reference panel can substantially improve the imputation accuracy compared to randomly selected reference panels, especially for the imputation of rare variants. The improvement in imputation accuracy holds across different marker densities, reference panel sizes, and lengths for the imputed segments. We thus propose a novel strategy for planning sequencing studies on samples with existing genotype data.
format Online
Article
Text
id pubmed-3781962
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-37819622013-10-01 Genotype Imputation Reference Panel Selection Using Maximal Phylogenetic Diversity Zhang, Peng Zhan, Xiaowei Rosenberg, Noah A. Zöllner, Sebastian Genetics Investigations The recent dramatic cost reduction of next-generation sequencing technology enables investigators to assess most variants in the human genome to identify risk variants for complex diseases. However, sequencing large samples remains very expensive. For a study sample with existing genotype data, such as array data from genome-wide association studies, a cost-effective approach is to sequence a subset of the study sample and then to impute the rest of the study sample, using the sequenced subset as a reference panel. The use of such an internal reference panel identifies population-specific variants and avoids the problem of a substantial mismatch in ancestry background between the study population and the reference population. To efficiently select an internal panel, we introduce an idea of phylogenetic diversity from mathematical phylogenetics and comparative genomics. We propose the “most diverse reference panel”, defined as the subset with the maximal “phylogenetic diversity”, thereby incorporating individuals that span a diverse range of genotypes within the sample. Using data both from simulations and from the 1000 Genomes Project, we show that the most diverse reference panel can substantially improve the imputation accuracy compared to randomly selected reference panels, especially for the imputation of rare variants. The improvement in imputation accuracy holds across different marker densities, reference panel sizes, and lengths for the imputed segments. We thus propose a novel strategy for planning sequencing studies on samples with existing genotype data. Genetics Society of America 2013-10 /pmc/articles/PMC3781962/ /pubmed/23934887 http://dx.doi.org/10.1534/genetics.113.154591 Text en Copyright © 2013 by the Genetics Society of America Available freely online through the author-supported open access option.
spellingShingle Investigations
Zhang, Peng
Zhan, Xiaowei
Rosenberg, Noah A.
Zöllner, Sebastian
Genotype Imputation Reference Panel Selection Using Maximal Phylogenetic Diversity
title Genotype Imputation Reference Panel Selection Using Maximal Phylogenetic Diversity
title_full Genotype Imputation Reference Panel Selection Using Maximal Phylogenetic Diversity
title_fullStr Genotype Imputation Reference Panel Selection Using Maximal Phylogenetic Diversity
title_full_unstemmed Genotype Imputation Reference Panel Selection Using Maximal Phylogenetic Diversity
title_short Genotype Imputation Reference Panel Selection Using Maximal Phylogenetic Diversity
title_sort genotype imputation reference panel selection using maximal phylogenetic diversity
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3781962/
https://www.ncbi.nlm.nih.gov/pubmed/23934887
http://dx.doi.org/10.1534/genetics.113.154591
work_keys_str_mv AT zhangpeng genotypeimputationreferencepanelselectionusingmaximalphylogeneticdiversity
AT zhanxiaowei genotypeimputationreferencepanelselectionusingmaximalphylogeneticdiversity
AT rosenbergnoaha genotypeimputationreferencepanelselectionusingmaximalphylogeneticdiversity
AT zollnersebastian genotypeimputationreferencepanelselectionusingmaximalphylogeneticdiversity