Cargando…
Imputing rare variants in families using a two-stage approach
BACKGROUND: Recent focus on studying rare variants makes imputation accuracy of rare variants an important issue. Many approaches have been proposed to increase imputation accuracy among rare variants, from reference panel selection to combinations of existing methods to multistage analyses. We aime...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133481/ https://www.ncbi.nlm.nih.gov/pubmed/27980638 http://dx.doi.org/10.1186/s12919-016-0032-y |
_version_ | 1782471270860849152 |
---|---|
author | Lent, Samantha Deng, Xuan Cupples, L. Adrienne Lunetta, Kathryn L. Liu, CT Zhou, Yanhua |
author_facet | Lent, Samantha Deng, Xuan Cupples, L. Adrienne Lunetta, Kathryn L. Liu, CT Zhou, Yanhua |
author_sort | Lent, Samantha |
collection | PubMed |
description | BACKGROUND: Recent focus on studying rare variants makes imputation accuracy of rare variants an important issue. Many approaches have been proposed to increase imputation accuracy among rare variants, from reference panel selection to combinations of existing methods to multistage analyses. We aimed to bring the strengths of these new approaches together with our proposed two-stage imputation for family data. METHODS: Our imputation methods were tested on the region from 46.75Mb to 49.25Mb on chromosome 3. We did quality control based on the proportion of missing genotypes per variant and individual, leaving 495 individuals with 761 genome-wide association studies (GWAS) variants only, 45 with 14,077 sequence variants only, and 419 with both GWAS and sequencing data. All data were prephased using SHAPEIT2 with a duo hidden Markov model algorithm prior to performing imputation. Imputations were performed 100 times, each time masking the sequence data for 1 individual and imputing it from the GWAS data. We used well-imputed genotypes, defined as a probability of greater than 0.9, above 2 different minor allele frequency cutoffs—0.01 and 0.05—from Impute2 as input for Merlin, and compared these results to Impute2 and Merlin separately. The imputed results were evaluated using correlation measurement and the imputation quality score. RESULTS: Our method improved imputation accuracy, measured by imputation quality score, for variants with minor allele frequency between 0.01 and 0.40, but failed to improve accuracy for variants with minor allele frequency less than 0.01 when we used a minor allele frequency cutoff of 0.01 for the Impute2 results. In contrast, our 2-stage approach with a minor allele frequency cutoff of 0.05 performed the worst of all methods for variants with minor allele frequency between 0.01 and 0.40. CONCLUSIONS: This method gave promising results, but may be further improved by changing the inclusion criteria of Impute2 variants. More analyses are needed on a larger region with different inclusion thresholds to assess the accuracy of this approach. |
format | Online Article Text |
id | pubmed-5133481 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51334812016-12-15 Imputing rare variants in families using a two-stage approach Lent, Samantha Deng, Xuan Cupples, L. Adrienne Lunetta, Kathryn L. Liu, CT Zhou, Yanhua BMC Proc Proceedings BACKGROUND: Recent focus on studying rare variants makes imputation accuracy of rare variants an important issue. Many approaches have been proposed to increase imputation accuracy among rare variants, from reference panel selection to combinations of existing methods to multistage analyses. We aimed to bring the strengths of these new approaches together with our proposed two-stage imputation for family data. METHODS: Our imputation methods were tested on the region from 46.75Mb to 49.25Mb on chromosome 3. We did quality control based on the proportion of missing genotypes per variant and individual, leaving 495 individuals with 761 genome-wide association studies (GWAS) variants only, 45 with 14,077 sequence variants only, and 419 with both GWAS and sequencing data. All data were prephased using SHAPEIT2 with a duo hidden Markov model algorithm prior to performing imputation. Imputations were performed 100 times, each time masking the sequence data for 1 individual and imputing it from the GWAS data. We used well-imputed genotypes, defined as a probability of greater than 0.9, above 2 different minor allele frequency cutoffs—0.01 and 0.05—from Impute2 as input for Merlin, and compared these results to Impute2 and Merlin separately. The imputed results were evaluated using correlation measurement and the imputation quality score. RESULTS: Our method improved imputation accuracy, measured by imputation quality score, for variants with minor allele frequency between 0.01 and 0.40, but failed to improve accuracy for variants with minor allele frequency less than 0.01 when we used a minor allele frequency cutoff of 0.01 for the Impute2 results. In contrast, our 2-stage approach with a minor allele frequency cutoff of 0.05 performed the worst of all methods for variants with minor allele frequency between 0.01 and 0.40. CONCLUSIONS: This method gave promising results, but may be further improved by changing the inclusion criteria of Impute2 variants. More analyses are needed on a larger region with different inclusion thresholds to assess the accuracy of this approach. BioMed Central 2016-10-18 /pmc/articles/PMC5133481/ /pubmed/27980638 http://dx.doi.org/10.1186/s12919-016-0032-y Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Lent, Samantha Deng, Xuan Cupples, L. Adrienne Lunetta, Kathryn L. Liu, CT Zhou, Yanhua Imputing rare variants in families using a two-stage approach |
title | Imputing rare variants in families using a two-stage approach |
title_full | Imputing rare variants in families using a two-stage approach |
title_fullStr | Imputing rare variants in families using a two-stage approach |
title_full_unstemmed | Imputing rare variants in families using a two-stage approach |
title_short | Imputing rare variants in families using a two-stage approach |
title_sort | imputing rare variants in families using a two-stage approach |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133481/ https://www.ncbi.nlm.nih.gov/pubmed/27980638 http://dx.doi.org/10.1186/s12919-016-0032-y |
work_keys_str_mv | AT lentsamantha imputingrarevariantsinfamiliesusingatwostageapproach AT dengxuan imputingrarevariantsinfamiliesusingatwostageapproach AT cupplesladrienne imputingrarevariantsinfamiliesusingatwostageapproach AT lunettakathrynl imputingrarevariantsinfamiliesusingatwostageapproach AT liuct imputingrarevariantsinfamiliesusingatwostageapproach AT zhouyanhua imputingrarevariantsinfamiliesusingatwostageapproach |