Cargando…

Imputing rare variants in families using a two-stage approach

BACKGROUND: Recent focus on studying rare variants makes imputation accuracy of rare variants an important issue. Many approaches have been proposed to increase imputation accuracy among rare variants, from reference panel selection to combinations of existing methods to multistage analyses. We aime...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lent, Samantha, Deng, Xuan, Cupples, L. Adrienne, Lunetta, Kathryn L., Liu, CT, Zhou, Yanhua
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133481/ https://www.ncbi.nlm.nih.gov/pubmed/27980638 http://dx.doi.org/10.1186/s12919-016-0032-y

_version_	1782471270860849152
author	Lent, Samantha Deng, Xuan Cupples, L. Adrienne Lunetta, Kathryn L. Liu, CT Zhou, Yanhua
author_facet	Lent, Samantha Deng, Xuan Cupples, L. Adrienne Lunetta, Kathryn L. Liu, CT Zhou, Yanhua
author_sort	Lent, Samantha
collection	PubMed
description	BACKGROUND: Recent focus on studying rare variants makes imputation accuracy of rare variants an important issue. Many approaches have been proposed to increase imputation accuracy among rare variants, from reference panel selection to combinations of existing methods to multistage analyses. We aimed to bring the strengths of these new approaches together with our proposed two-stage imputation for family data. METHODS: Our imputation methods were tested on the region from 46.75Mb to 49.25Mb on chromosome 3. We did quality control based on the proportion of missing genotypes per variant and individual, leaving 495 individuals with 761 genome-wide association studies (GWAS) variants only, 45 with 14,077 sequence variants only, and 419 with both GWAS and sequencing data. All data were prephased using SHAPEIT2 with a duo hidden Markov model algorithm prior to performing imputation. Imputations were performed 100 times, each time masking the sequence data for 1 individual and imputing it from the GWAS data. We used well-imputed genotypes, defined as a probability of greater than 0.9, above 2 different minor allele frequency cutoffs—0.01 and 0.05—from Impute2 as input for Merlin, and compared these results to Impute2 and Merlin separately. The imputed results were evaluated using correlation measurement and the imputation quality score. RESULTS: Our method improved imputation accuracy, measured by imputation quality score, for variants with minor allele frequency between 0.01 and 0.40, but failed to improve accuracy for variants with minor allele frequency less than 0.01 when we used a minor allele frequency cutoff of 0.01 for the Impute2 results. In contrast, our 2-stage approach with a minor allele frequency cutoff of 0.05 performed the worst of all methods for variants with minor allele frequency between 0.01 and 0.40. CONCLUSIONS: This method gave promising results, but may be further improved by changing the inclusion criteria of Impute2 variants. More analyses are needed on a larger region with different inclusion thresholds to assess the accuracy of this approach.
format	Online Article Text
id	pubmed-5133481
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-51334812016-12-15 Imputing rare variants in families using a two-stage approach Lent, Samantha Deng, Xuan Cupples, L. Adrienne Lunetta, Kathryn L. Liu, CT Zhou, Yanhua BMC Proc Proceedings BACKGROUND: Recent focus on studying rare variants makes imputation accuracy of rare variants an important issue. Many approaches have been proposed to increase imputation accuracy among rare variants, from reference panel selection to combinations of existing methods to multistage analyses. We aimed to bring the strengths of these new approaches together with our proposed two-stage imputation for family data. METHODS: Our imputation methods were tested on the region from 46.75Mb to 49.25Mb on chromosome 3. We did quality control based on the proportion of missing genotypes per variant and individual, leaving 495 individuals with 761 genome-wide association studies (GWAS) variants only, 45 with 14,077 sequence variants only, and 419 with both GWAS and sequencing data. All data were prephased using SHAPEIT2 with a duo hidden Markov model algorithm prior to performing imputation. Imputations were performed 100 times, each time masking the sequence data for 1 individual and imputing it from the GWAS data. We used well-imputed genotypes, defined as a probability of greater than 0.9, above 2 different minor allele frequency cutoffs—0.01 and 0.05—from Impute2 as input for Merlin, and compared these results to Impute2 and Merlin separately. The imputed results were evaluated using correlation measurement and the imputation quality score. RESULTS: Our method improved imputation accuracy, measured by imputation quality score, for variants with minor allele frequency between 0.01 and 0.40, but failed to improve accuracy for variants with minor allele frequency less than 0.01 when we used a minor allele frequency cutoff of 0.01 for the Impute2 results. In contrast, our 2-stage approach with a minor allele frequency cutoff of 0.05 performed the worst of all methods for variants with minor allele frequency between 0.01 and 0.40. CONCLUSIONS: This method gave promising results, but may be further improved by changing the inclusion criteria of Impute2 variants. More analyses are needed on a larger region with different inclusion thresholds to assess the accuracy of this approach. BioMed Central 2016-10-18 /pmc/articles/PMC5133481/ /pubmed/27980638 http://dx.doi.org/10.1186/s12919-016-0032-y Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings Lent, Samantha Deng, Xuan Cupples, L. Adrienne Lunetta, Kathryn L. Liu, CT Zhou, Yanhua Imputing rare variants in families using a two-stage approach
title	Imputing rare variants in families using a two-stage approach
title_full	Imputing rare variants in families using a two-stage approach
title_fullStr	Imputing rare variants in families using a two-stage approach
title_full_unstemmed	Imputing rare variants in families using a two-stage approach
title_short	Imputing rare variants in families using a two-stage approach
title_sort	imputing rare variants in families using a two-stage approach
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133481/ https://www.ncbi.nlm.nih.gov/pubmed/27980638 http://dx.doi.org/10.1186/s12919-016-0032-y
work_keys_str_mv	AT lentsamantha imputingrarevariantsinfamiliesusingatwostageapproach AT dengxuan imputingrarevariantsinfamiliesusingatwostageapproach AT cupplesladrienne imputingrarevariantsinfamiliesusingatwostageapproach AT lunettakathrynl imputingrarevariantsinfamiliesusingatwostageapproach AT liuct imputingrarevariantsinfamiliesusingatwostageapproach AT zhouyanhua imputingrarevariantsinfamiliesusingatwostageapproach

Imputing rare variants in families using a two-stage approach

Ejemplares similares