Cargando…

A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes

BACKGROUND: Knowing the phase of marker genotype data can be useful in genome-wide association studies, because it makes it possible to use analysis frameworks that account for identity by descent or parent of origin of alleles and it can lead to a large increase in data quantities via genotype or s...

Descripción completa

Detalles Bibliográficos
Autores principales: Hickey, John M, Kinghorn, Brian P, Tier, Bruce, Wilson, James F, Dunstan, Neil, van der Werf, Julius HJ
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3068938/
https://www.ncbi.nlm.nih.gov/pubmed/21388557
http://dx.doi.org/10.1186/1297-9686-43-12
_version_ 1782201286715768832
author Hickey, John M
Kinghorn, Brian P
Tier, Bruce
Wilson, James F
Dunstan, Neil
van der Werf, Julius HJ
author_facet Hickey, John M
Kinghorn, Brian P
Tier, Bruce
Wilson, James F
Dunstan, Neil
van der Werf, Julius HJ
author_sort Hickey, John M
collection PubMed
description BACKGROUND: Knowing the phase of marker genotype data can be useful in genome-wide association studies, because it makes it possible to use analysis frameworks that account for identity by descent or parent of origin of alleles and it can lead to a large increase in data quantities via genotype or sequence imputation. Long-range phasing and haplotype library imputation constitute a fast and accurate method to impute phase for SNP data. METHODS: A long-range phasing and haplotype library imputation algorithm was developed. It combines information from surrogate parents and long haplotypes to resolve phase in a manner that is not dependent on the family structure of a dataset or on the presence of pedigree information. RESULTS: The algorithm performed well in both simulated and real livestock and human datasets in terms of both phasing accuracy and computation efficiency. The percentage of alleles that could be phased in both simulated and real datasets of varying size generally exceeded 98% while the percentage of alleles incorrectly phased in simulated data was generally less than 0.5%. The accuracy of phasing was affected by dataset size, with lower accuracy for dataset sizes less than 1000, but was not affected by effective population size, family data structure, presence or absence of pedigree information, and SNP density. The method was computationally fast. In comparison to a commonly used statistical method (fastPHASE), the current method made about 8% less phasing mistakes and ran about 26 times faster for a small dataset. For larger datasets, the differences in computational time are expected to be even greater. A computer program implementing these methods has been made available. CONCLUSIONS: The algorithm and software developed in this study make feasible the routine phasing of high-density SNP chips in large datasets.
format Text
id pubmed-3068938
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30689382011-04-01 A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes Hickey, John M Kinghorn, Brian P Tier, Bruce Wilson, James F Dunstan, Neil van der Werf, Julius HJ Genet Sel Evol Research BACKGROUND: Knowing the phase of marker genotype data can be useful in genome-wide association studies, because it makes it possible to use analysis frameworks that account for identity by descent or parent of origin of alleles and it can lead to a large increase in data quantities via genotype or sequence imputation. Long-range phasing and haplotype library imputation constitute a fast and accurate method to impute phase for SNP data. METHODS: A long-range phasing and haplotype library imputation algorithm was developed. It combines information from surrogate parents and long haplotypes to resolve phase in a manner that is not dependent on the family structure of a dataset or on the presence of pedigree information. RESULTS: The algorithm performed well in both simulated and real livestock and human datasets in terms of both phasing accuracy and computation efficiency. The percentage of alleles that could be phased in both simulated and real datasets of varying size generally exceeded 98% while the percentage of alleles incorrectly phased in simulated data was generally less than 0.5%. The accuracy of phasing was affected by dataset size, with lower accuracy for dataset sizes less than 1000, but was not affected by effective population size, family data structure, presence or absence of pedigree information, and SNP density. The method was computationally fast. In comparison to a commonly used statistical method (fastPHASE), the current method made about 8% less phasing mistakes and ran about 26 times faster for a small dataset. For larger datasets, the differences in computational time are expected to be even greater. A computer program implementing these methods has been made available. CONCLUSIONS: The algorithm and software developed in this study make feasible the routine phasing of high-density SNP chips in large datasets. BioMed Central 2011-03-10 /pmc/articles/PMC3068938/ /pubmed/21388557 http://dx.doi.org/10.1186/1297-9686-43-12 Text en Copyright ©2011 Hickey et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Hickey, John M
Kinghorn, Brian P
Tier, Bruce
Wilson, James F
Dunstan, Neil
van der Werf, Julius HJ
A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes
title A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes
title_full A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes
title_fullStr A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes
title_full_unstemmed A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes
title_short A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes
title_sort combined long-range phasing and long haplotype imputation method to impute phase for snp genotypes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3068938/
https://www.ncbi.nlm.nih.gov/pubmed/21388557
http://dx.doi.org/10.1186/1297-9686-43-12
work_keys_str_mv AT hickeyjohnm acombinedlongrangephasingandlonghaplotypeimputationmethodtoimputephaseforsnpgenotypes
AT kinghornbrianp acombinedlongrangephasingandlonghaplotypeimputationmethodtoimputephaseforsnpgenotypes
AT tierbruce acombinedlongrangephasingandlonghaplotypeimputationmethodtoimputephaseforsnpgenotypes
AT wilsonjamesf acombinedlongrangephasingandlonghaplotypeimputationmethodtoimputephaseforsnpgenotypes
AT dunstanneil acombinedlongrangephasingandlonghaplotypeimputationmethodtoimputephaseforsnpgenotypes
AT vanderwerfjuliushj acombinedlongrangephasingandlonghaplotypeimputationmethodtoimputephaseforsnpgenotypes
AT hickeyjohnm combinedlongrangephasingandlonghaplotypeimputationmethodtoimputephaseforsnpgenotypes
AT kinghornbrianp combinedlongrangephasingandlonghaplotypeimputationmethodtoimputephaseforsnpgenotypes
AT tierbruce combinedlongrangephasingandlonghaplotypeimputationmethodtoimputephaseforsnpgenotypes
AT wilsonjamesf combinedlongrangephasingandlonghaplotypeimputationmethodtoimputephaseforsnpgenotypes
AT dunstanneil combinedlongrangephasingandlonghaplotypeimputationmethodtoimputephaseforsnpgenotypes
AT vanderwerfjuliushj combinedlongrangephasingandlonghaplotypeimputationmethodtoimputephaseforsnpgenotypes