Cargando…

Breakdown of Methods for Phasing and Imputation in the Presence of Double Genotype Sharing

In genome-wide association studies, results have been improved through imputation of a denser marker set based on reference haplotypes and phasing of the genotype data. To better handle very large sets of reference haplotypes, pre-phasing with only study individuals has been suggested. We present a...

Descripción completa

Detalles Bibliográficos
Autor principal: Nettelblad, Carl
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3610665/
https://www.ncbi.nlm.nih.gov/pubmed/23555958
http://dx.doi.org/10.1371/journal.pone.0060354
_version_ 1782264486469566464
author Nettelblad, Carl
author_facet Nettelblad, Carl
author_sort Nettelblad, Carl
collection PubMed
description In genome-wide association studies, results have been improved through imputation of a denser marker set based on reference haplotypes and phasing of the genotype data. To better handle very large sets of reference haplotypes, pre-phasing with only study individuals has been suggested. We present a possible problem which is aggravated when pre-phasing strategies are used, and suggest a modification avoiding the resulting issues with application to the MaCH tool, although the underlying problem is not specific to that tool. We evaluate the effectiveness of our remedy to a subset of Hapmap data, comparing the original version of MaCH and our modified approach. Improvements are demonstrated on the original data (phase switch error rate decreasing by 10%), but the differences are more pronounced in cases where the data is augmented to represent the presence of closely related individuals, especially when siblings are present (30% reduction in switch error rate in the presence of children, 47% reduction in the presence of siblings). The main conclusion of this investigation is that existing statistical methods for phasing and imputation of unrelated individuals might give results of sub-par quality if a subset of study individuals nonetheless are related. As the populations collected for general genome-wide association studies grow in size, including relatives might become more common. If a general GWAS framework for unrelated individuals would be employed on datasets with some related individuals, such as including familial data or material from domesticated animals, caution should also be taken regarding the quality of haplotypes. Our modification to MaCH is available on request and straightforward to implement. We hope that this mode, if found to be of use, could be integrated as an option in future standard distributions of MaCH.
format Online
Article
Text
id pubmed-3610665
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36106652013-04-03 Breakdown of Methods for Phasing and Imputation in the Presence of Double Genotype Sharing Nettelblad, Carl PLoS One Research Article In genome-wide association studies, results have been improved through imputation of a denser marker set based on reference haplotypes and phasing of the genotype data. To better handle very large sets of reference haplotypes, pre-phasing with only study individuals has been suggested. We present a possible problem which is aggravated when pre-phasing strategies are used, and suggest a modification avoiding the resulting issues with application to the MaCH tool, although the underlying problem is not specific to that tool. We evaluate the effectiveness of our remedy to a subset of Hapmap data, comparing the original version of MaCH and our modified approach. Improvements are demonstrated on the original data (phase switch error rate decreasing by 10%), but the differences are more pronounced in cases where the data is augmented to represent the presence of closely related individuals, especially when siblings are present (30% reduction in switch error rate in the presence of children, 47% reduction in the presence of siblings). The main conclusion of this investigation is that existing statistical methods for phasing and imputation of unrelated individuals might give results of sub-par quality if a subset of study individuals nonetheless are related. As the populations collected for general genome-wide association studies grow in size, including relatives might become more common. If a general GWAS framework for unrelated individuals would be employed on datasets with some related individuals, such as including familial data or material from domesticated animals, caution should also be taken regarding the quality of haplotypes. Our modification to MaCH is available on request and straightforward to implement. We hope that this mode, if found to be of use, could be integrated as an option in future standard distributions of MaCH. Public Library of Science 2013-03-28 /pmc/articles/PMC3610665/ /pubmed/23555958 http://dx.doi.org/10.1371/journal.pone.0060354 Text en © 2013 Carl Nettelblad http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Nettelblad, Carl
Breakdown of Methods for Phasing and Imputation in the Presence of Double Genotype Sharing
title Breakdown of Methods for Phasing and Imputation in the Presence of Double Genotype Sharing
title_full Breakdown of Methods for Phasing and Imputation in the Presence of Double Genotype Sharing
title_fullStr Breakdown of Methods for Phasing and Imputation in the Presence of Double Genotype Sharing
title_full_unstemmed Breakdown of Methods for Phasing and Imputation in the Presence of Double Genotype Sharing
title_short Breakdown of Methods for Phasing and Imputation in the Presence of Double Genotype Sharing
title_sort breakdown of methods for phasing and imputation in the presence of double genotype sharing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3610665/
https://www.ncbi.nlm.nih.gov/pubmed/23555958
http://dx.doi.org/10.1371/journal.pone.0060354
work_keys_str_mv AT nettelbladcarl breakdownofmethodsforphasingandimputationinthepresenceofdoublegenotypesharing