Cargando…

On combining family- and population-based sequencing data

Several statistical group-based approaches have been proposed to detect effects of variation within a gene for each of the population- and family-based designs. However, unified tests to combine gene-phenotype associations obtained from these 2 study designs are not yet well established. In this stu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Katsumata, Yuriko, Fardo, David W.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133531/ https://www.ncbi.nlm.nih.gov/pubmed/27980632 http://dx.doi.org/10.1186/s12919-016-0026-9

_version_	1782471282468585472
author	Katsumata, Yuriko Fardo, David W.
author_facet	Katsumata, Yuriko Fardo, David W.
author_sort	Katsumata, Yuriko
collection	PubMed
description	Several statistical group-based approaches have been proposed to detect effects of variation within a gene for each of the population- and family-based designs. However, unified tests to combine gene-phenotype associations obtained from these 2 study designs are not yet well established. In this study, we investigated the efficient combination of population-based and family-based sequencing data to evaluate best practices using the Genetic Analysis Workshop 19 (GAW19) data set. Because one design employed whole genome sequencing and the other whole exome sequencing, we examined variants overlapping both data sets. We used the family-based sequence kernel association test (famSKAT) to analyze the family- and population-based data sets separately as well as with a combined data set. These were compared against meta-analysis. Using the combined data, we showed that famSKAT has high power to detect associations between diastolic and/or systolic blood pressures and the genes that have causal variants with large effect sizes, such as MAP4, TNN, and CGN. However, when there was a considerable difference in the powers between family- and population-based data, famSKAT with the combined data had lower power than that from the population-based data alone. The famSKAT test statistic for the combined data can be influenced by sample imbalance from the 2 designs. This underscores the importance of foresight in study design as, in this situation, the greatly lower sample size in the family-based data essentially serves to dilute signal. We observed inflated type I errors in our simulation study, largely when using population-based data, which might be a result of principal components failing to completely account for population admixture in this cohort.
format	Online Article Text
id	pubmed-5133531
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-51335312016-12-15 On combining family- and population-based sequencing data Katsumata, Yuriko Fardo, David W. BMC Proc Proceedings Several statistical group-based approaches have been proposed to detect effects of variation within a gene for each of the population- and family-based designs. However, unified tests to combine gene-phenotype associations obtained from these 2 study designs are not yet well established. In this study, we investigated the efficient combination of population-based and family-based sequencing data to evaluate best practices using the Genetic Analysis Workshop 19 (GAW19) data set. Because one design employed whole genome sequencing and the other whole exome sequencing, we examined variants overlapping both data sets. We used the family-based sequence kernel association test (famSKAT) to analyze the family- and population-based data sets separately as well as with a combined data set. These were compared against meta-analysis. Using the combined data, we showed that famSKAT has high power to detect associations between diastolic and/or systolic blood pressures and the genes that have causal variants with large effect sizes, such as MAP4, TNN, and CGN. However, when there was a considerable difference in the powers between family- and population-based data, famSKAT with the combined data had lower power than that from the population-based data alone. The famSKAT test statistic for the combined data can be influenced by sample imbalance from the 2 designs. This underscores the importance of foresight in study design as, in this situation, the greatly lower sample size in the family-based data essentially serves to dilute signal. We observed inflated type I errors in our simulation study, largely when using population-based data, which might be a result of principal components failing to completely account for population admixture in this cohort. BioMed Central 2016-10-18 /pmc/articles/PMC5133531/ /pubmed/27980632 http://dx.doi.org/10.1186/s12919-016-0026-9 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings Katsumata, Yuriko Fardo, David W. On combining family- and population-based sequencing data
title	On combining family- and population-based sequencing data
title_full	On combining family- and population-based sequencing data
title_fullStr	On combining family- and population-based sequencing data
title_full_unstemmed	On combining family- and population-based sequencing data
title_short	On combining family- and population-based sequencing data
title_sort	on combining family- and population-based sequencing data
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133531/ https://www.ncbi.nlm.nih.gov/pubmed/27980632 http://dx.doi.org/10.1186/s12919-016-0026-9
work_keys_str_mv	AT katsumatayuriko oncombiningfamilyandpopulationbasedsequencingdata AT fardodavidw oncombiningfamilyandpopulationbasedsequencingdata

On combining family- and population-based sequencing data

Ejemplares similares