Cargando…

Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation

Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent....

Descripción completa

Detalles Bibliográficos
Autores principales:	Palmer, Cameron, Pe’er, Itsik
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4910998/ https://www.ncbi.nlm.nih.gov/pubmed/27310603 http://dx.doi.org/10.1371/journal.pgen.1006091

_version_	1782438065146429440
author	Palmer, Cameron Pe’er, Itsik
author_facet	Palmer, Cameron Pe’er, Itsik
author_sort	Palmer, Cameron
collection	PubMed
description	Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data.
format	Online Article Text
id	pubmed-4910998
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-49109982016-07-06 Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation Palmer, Cameron Pe’er, Itsik PLoS Genet Research Article Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data. Public Library of Science 2016-06-16 /pmc/articles/PMC4910998/ /pubmed/27310603 http://dx.doi.org/10.1371/journal.pgen.1006091 Text en © 2016 Palmer, Pe’er http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Palmer, Cameron Pe’er, Itsik Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation
title	Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation
title_full	Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation
title_fullStr	Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation
title_full_unstemmed	Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation
title_short	Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation
title_sort	bias characterization in probabilistic genotype data and improved signal detection with multiple imputation
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4910998/ https://www.ncbi.nlm.nih.gov/pubmed/27310603 http://dx.doi.org/10.1371/journal.pgen.1006091
work_keys_str_mv	AT palmercameron biascharacterizationinprobabilisticgenotypedataandimprovedsignaldetectionwithmultipleimputation AT peeritsik biascharacterizationinprobabilisticgenotypedataandimprovedsignaldetectionwithmultipleimputation

Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation

Ejemplares similares