Cargando…

Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data

Knowledge of biological relatedness between samples is important for many genetic studies. In large-scale human genetic association studies, the estimated kinship is used to remove cryptic relatedness, control for family structure, and estimate trait heritability. However, estimation of kinship is c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dou, Jinzhuang, Sun, Baoluo, Sim, Xueling, Hughes, Jason D., Reilly, Dermot F., Tai, E. Shyong, Liu, Jianjun, Wang, Chaolong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5636172/ https://www.ncbi.nlm.nih.gov/pubmed/28961250 http://dx.doi.org/10.1371/journal.pgen.1007021

_version_	1783270401333264384
author	Dou, Jinzhuang Sun, Baoluo Sim, Xueling Hughes, Jason D. Reilly, Dermot F. Tai, E. Shyong Liu, Jianjun Wang, Chaolong
author_facet	Dou, Jinzhuang Sun, Baoluo Sim, Xueling Hughes, Jason D. Reilly, Dermot F. Tai, E. Shyong Liu, Jianjun Wang, Chaolong
author_sort	Dou, Jinzhuang
collection	PubMed
description	Knowledge of biological relatedness between samples is important for many genetic studies. In large-scale human genetic association studies, the estimated kinship is used to remove cryptic relatedness, control for family structure, and estimate trait heritability. However, estimation of kinship is challenging for sparse sequencing data, such as those from off-target regions in target sequencing studies, where genotypes are largely uncertain or missing. Existing methods often assume accurate genotypes at a large number of markers across the genome. We show that these methods, without accounting for the genotype uncertainty in sparse sequencing data, can yield a strong downward bias in kinship estimation. We develop a computationally efficient method called SEEKIN to estimate kinship for both homogeneous samples and heterogeneous samples with population structure and admixture. Our method models genotype uncertainty and leverages linkage disequilibrium through imputation. We test SEEKIN on a whole exome sequencing dataset (WES) of Singapore Chinese and Malays, which involves substantial population structure and admixture. We show that SEEKIN can accurately estimate kinship coefficient and classify genetic relatedness using off-target sequencing data down sampled to ~0.15X depth. In application to the full WES dataset without down sampling, SEEKIN also outperforms existing methods by properly analyzing shallow off-target data (~0.75X). Using both simulated and real phenotypes, we further illustrate how our method improves estimation of trait heritability for WES studies.
format	Online Article Text
id	pubmed-5636172
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-56361722017-10-30 Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data Dou, Jinzhuang Sun, Baoluo Sim, Xueling Hughes, Jason D. Reilly, Dermot F. Tai, E. Shyong Liu, Jianjun Wang, Chaolong PLoS Genet Research Article Knowledge of biological relatedness between samples is important for many genetic studies. In large-scale human genetic association studies, the estimated kinship is used to remove cryptic relatedness, control for family structure, and estimate trait heritability. However, estimation of kinship is challenging for sparse sequencing data, such as those from off-target regions in target sequencing studies, where genotypes are largely uncertain or missing. Existing methods often assume accurate genotypes at a large number of markers across the genome. We show that these methods, without accounting for the genotype uncertainty in sparse sequencing data, can yield a strong downward bias in kinship estimation. We develop a computationally efficient method called SEEKIN to estimate kinship for both homogeneous samples and heterogeneous samples with population structure and admixture. Our method models genotype uncertainty and leverages linkage disequilibrium through imputation. We test SEEKIN on a whole exome sequencing dataset (WES) of Singapore Chinese and Malays, which involves substantial population structure and admixture. We show that SEEKIN can accurately estimate kinship coefficient and classify genetic relatedness using off-target sequencing data down sampled to ~0.15X depth. In application to the full WES dataset without down sampling, SEEKIN also outperforms existing methods by properly analyzing shallow off-target data (~0.75X). Using both simulated and real phenotypes, we further illustrate how our method improves estimation of trait heritability for WES studies. Public Library of Science 2017-09-29 /pmc/articles/PMC5636172/ /pubmed/28961250 http://dx.doi.org/10.1371/journal.pgen.1007021 Text en © 2017 Dou et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Dou, Jinzhuang Sun, Baoluo Sim, Xueling Hughes, Jason D. Reilly, Dermot F. Tai, E. Shyong Liu, Jianjun Wang, Chaolong Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data
title	Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data
title_full	Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data
title_fullStr	Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data
title_full_unstemmed	Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data
title_short	Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data
title_sort	estimation of kinship coefficient in structured and admixed populations using sparse sequencing data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5636172/ https://www.ncbi.nlm.nih.gov/pubmed/28961250 http://dx.doi.org/10.1371/journal.pgen.1007021
work_keys_str_mv	AT doujinzhuang estimationofkinshipcoefficientinstructuredandadmixedpopulationsusingsparsesequencingdata AT sunbaoluo estimationofkinshipcoefficientinstructuredandadmixedpopulationsusingsparsesequencingdata AT simxueling estimationofkinshipcoefficientinstructuredandadmixedpopulationsusingsparsesequencingdata AT hughesjasond estimationofkinshipcoefficientinstructuredandadmixedpopulationsusingsparsesequencingdata AT reillydermotf estimationofkinshipcoefficientinstructuredandadmixedpopulationsusingsparsesequencingdata AT taieshyong estimationofkinshipcoefficientinstructuredandadmixedpopulationsusingsparsesequencingdata AT liujianjun estimationofkinshipcoefficientinstructuredandadmixedpopulationsusingsparsesequencingdata AT wangchaolong estimationofkinshipcoefficientinstructuredandadmixedpopulationsusingsparsesequencingdata

Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data

Ejemplares similares