Cargando…

Imputing rare variants in families using a two-stage approach

BACKGROUND: Recent focus on studying rare variants makes imputation accuracy of rare variants an important issue. Many approaches have been proposed to increase imputation accuracy among rare variants, from reference panel selection to combinations of existing methods to multistage analyses. We aime...

Descripción completa

Detalles Bibliográficos
Autores principales: Lent, Samantha, Deng, Xuan, Cupples, L. Adrienne, Lunetta, Kathryn L., Liu, CT, Zhou, Yanhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133481/
https://www.ncbi.nlm.nih.gov/pubmed/27980638
http://dx.doi.org/10.1186/s12919-016-0032-y
_version_ 1782471270860849152
author Lent, Samantha
Deng, Xuan
Cupples, L. Adrienne
Lunetta, Kathryn L.
Liu, CT
Zhou, Yanhua
author_facet Lent, Samantha
Deng, Xuan
Cupples, L. Adrienne
Lunetta, Kathryn L.
Liu, CT
Zhou, Yanhua
author_sort Lent, Samantha
collection PubMed
description BACKGROUND: Recent focus on studying rare variants makes imputation accuracy of rare variants an important issue. Many approaches have been proposed to increase imputation accuracy among rare variants, from reference panel selection to combinations of existing methods to multistage analyses. We aimed to bring the strengths of these new approaches together with our proposed two-stage imputation for family data. METHODS: Our imputation methods were tested on the region from 46.75Mb to 49.25Mb on chromosome 3. We did quality control based on the proportion of missing genotypes per variant and individual, leaving 495 individuals with 761 genome-wide association studies (GWAS) variants only, 45 with 14,077 sequence variants only, and 419 with both GWAS and sequencing data. All data were prephased using SHAPEIT2 with a duo hidden Markov model algorithm prior to performing imputation. Imputations were performed 100 times, each time masking the sequence data for 1 individual and imputing it from the GWAS data. We used well-imputed genotypes, defined as a probability of greater than 0.9, above 2 different minor allele frequency cutoffs—0.01 and 0.05—from Impute2 as input for Merlin, and compared these results to Impute2 and Merlin separately. The imputed results were evaluated using correlation measurement and the imputation quality score. RESULTS: Our method improved imputation accuracy, measured by imputation quality score, for variants with minor allele frequency between 0.01 and 0.40, but failed to improve accuracy for variants with minor allele frequency less than 0.01 when we used a minor allele frequency cutoff of 0.01 for the Impute2 results. In contrast, our 2-stage approach with a minor allele frequency cutoff of 0.05 performed the worst of all methods for variants with minor allele frequency between 0.01 and 0.40. CONCLUSIONS: This method gave promising results, but may be further improved by changing the inclusion criteria of Impute2 variants. More analyses are needed on a larger region with different inclusion thresholds to assess the accuracy of this approach.
format Online
Article
Text
id pubmed-5133481
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51334812016-12-15 Imputing rare variants in families using a two-stage approach Lent, Samantha Deng, Xuan Cupples, L. Adrienne Lunetta, Kathryn L. Liu, CT Zhou, Yanhua BMC Proc Proceedings BACKGROUND: Recent focus on studying rare variants makes imputation accuracy of rare variants an important issue. Many approaches have been proposed to increase imputation accuracy among rare variants, from reference panel selection to combinations of existing methods to multistage analyses. We aimed to bring the strengths of these new approaches together with our proposed two-stage imputation for family data. METHODS: Our imputation methods were tested on the region from 46.75Mb to 49.25Mb on chromosome 3. We did quality control based on the proportion of missing genotypes per variant and individual, leaving 495 individuals with 761 genome-wide association studies (GWAS) variants only, 45 with 14,077 sequence variants only, and 419 with both GWAS and sequencing data. All data were prephased using SHAPEIT2 with a duo hidden Markov model algorithm prior to performing imputation. Imputations were performed 100 times, each time masking the sequence data for 1 individual and imputing it from the GWAS data. We used well-imputed genotypes, defined as a probability of greater than 0.9, above 2 different minor allele frequency cutoffs—0.01 and 0.05—from Impute2 as input for Merlin, and compared these results to Impute2 and Merlin separately. The imputed results were evaluated using correlation measurement and the imputation quality score. RESULTS: Our method improved imputation accuracy, measured by imputation quality score, for variants with minor allele frequency between 0.01 and 0.40, but failed to improve accuracy for variants with minor allele frequency less than 0.01 when we used a minor allele frequency cutoff of 0.01 for the Impute2 results. In contrast, our 2-stage approach with a minor allele frequency cutoff of 0.05 performed the worst of all methods for variants with minor allele frequency between 0.01 and 0.40. CONCLUSIONS: This method gave promising results, but may be further improved by changing the inclusion criteria of Impute2 variants. More analyses are needed on a larger region with different inclusion thresholds to assess the accuracy of this approach. BioMed Central 2016-10-18 /pmc/articles/PMC5133481/ /pubmed/27980638 http://dx.doi.org/10.1186/s12919-016-0032-y Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Lent, Samantha
Deng, Xuan
Cupples, L. Adrienne
Lunetta, Kathryn L.
Liu, CT
Zhou, Yanhua
Imputing rare variants in families using a two-stage approach
title Imputing rare variants in families using a two-stage approach
title_full Imputing rare variants in families using a two-stage approach
title_fullStr Imputing rare variants in families using a two-stage approach
title_full_unstemmed Imputing rare variants in families using a two-stage approach
title_short Imputing rare variants in families using a two-stage approach
title_sort imputing rare variants in families using a two-stage approach
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133481/
https://www.ncbi.nlm.nih.gov/pubmed/27980638
http://dx.doi.org/10.1186/s12919-016-0032-y
work_keys_str_mv AT lentsamantha imputingrarevariantsinfamiliesusingatwostageapproach
AT dengxuan imputingrarevariantsinfamiliesusingatwostageapproach
AT cupplesladrienne imputingrarevariantsinfamiliesusingatwostageapproach
AT lunettakathrynl imputingrarevariantsinfamiliesusingatwostageapproach
AT liuct imputingrarevariantsinfamiliesusingatwostageapproach
AT zhouyanhua imputingrarevariantsinfamiliesusingatwostageapproach