Cargando…

Simpute: An Efficient Solution for Dense Genotypic Data

Single nucleotide polymorphism (SNP) data derived from array-based technology or massive parallel sequencing are often flawed with missing data. Missing SNPs can bias the results of association analyses. To maximize information usage, imputation is often adopted to compensate for the missing data by...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lin, Yen-Jen, Chang, Chun-Tien, Tang, Chuan Yi, Hsieh, Wen-Ping
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi Publishing Corporation 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3581137/ https://www.ncbi.nlm.nih.gov/pubmed/23509783 http://dx.doi.org/10.1155/2013/813912

_version_	1782260375946788864
author	Lin, Yen-Jen Chang, Chun-Tien Tang, Chuan Yi Hsieh, Wen-Ping
author_facet	Lin, Yen-Jen Chang, Chun-Tien Tang, Chuan Yi Hsieh, Wen-Ping
author_sort	Lin, Yen-Jen
collection	PubMed
description	Single nucleotide polymorphism (SNP) data derived from array-based technology or massive parallel sequencing are often flawed with missing data. Missing SNPs can bias the results of association analyses. To maximize information usage, imputation is often adopted to compensate for the missing data by filling in the most probable values. To better understand the available tools for this purpose, we compare the imputation performances among BEAGLE, IMPUTE, BIMBAM, SNPMStat, MACH, and PLINK with data generated by randomly masking the genotype data from the International HapMap Phase III project. In addition, we propose a new algorithm called simple imputation (Simpute) that benefits from the high resolution of the SNPs in the array platform. Simpute does not require any reference data. The best feature of Simpute is its computational efficiency with complexity of order (mw + n), where n is the number of missing SNPs, w is the number of the positions of the missing SNPs, and m is the number of people considered. Simpute is suitable for regular screening of the large-scale SNP genotyping particularly when the sample size is large, and efficiency is a major concern in the analysis.
format	Online Article Text
id	pubmed-3581137
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Hindawi Publishing Corporation
record_format	MEDLINE/PubMed
spelling	pubmed-35811372013-03-18 Simpute: An Efficient Solution for Dense Genotypic Data Lin, Yen-Jen Chang, Chun-Tien Tang, Chuan Yi Hsieh, Wen-Ping Biomed Res Int Research Article Single nucleotide polymorphism (SNP) data derived from array-based technology or massive parallel sequencing are often flawed with missing data. Missing SNPs can bias the results of association analyses. To maximize information usage, imputation is often adopted to compensate for the missing data by filling in the most probable values. To better understand the available tools for this purpose, we compare the imputation performances among BEAGLE, IMPUTE, BIMBAM, SNPMStat, MACH, and PLINK with data generated by randomly masking the genotype data from the International HapMap Phase III project. In addition, we propose a new algorithm called simple imputation (Simpute) that benefits from the high resolution of the SNPs in the array platform. Simpute does not require any reference data. The best feature of Simpute is its computational efficiency with complexity of order (mw + n), where n is the number of missing SNPs, w is the number of the positions of the missing SNPs, and m is the number of people considered. Simpute is suitable for regular screening of the large-scale SNP genotyping particularly when the sample size is large, and efficiency is a major concern in the analysis. Hindawi Publishing Corporation 2013 2013-02-03 /pmc/articles/PMC3581137/ /pubmed/23509783 http://dx.doi.org/10.1155/2013/813912 Text en Copyright © 2013 Yen-Jen Lin et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Lin, Yen-Jen Chang, Chun-Tien Tang, Chuan Yi Hsieh, Wen-Ping Simpute: An Efficient Solution for Dense Genotypic Data
title	Simpute: An Efficient Solution for Dense Genotypic Data
title_full	Simpute: An Efficient Solution for Dense Genotypic Data
title_fullStr	Simpute: An Efficient Solution for Dense Genotypic Data
title_full_unstemmed	Simpute: An Efficient Solution for Dense Genotypic Data
title_short	Simpute: An Efficient Solution for Dense Genotypic Data
title_sort	simpute: an efficient solution for dense genotypic data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3581137/ https://www.ncbi.nlm.nih.gov/pubmed/23509783 http://dx.doi.org/10.1155/2013/813912
work_keys_str_mv	AT linyenjen simputeanefficientsolutionfordensegenotypicdata AT changchuntien simputeanefficientsolutionfordensegenotypicdata AT tangchuanyi simputeanefficientsolutionfordensegenotypicdata AT hsiehwenping simputeanefficientsolutionfordensegenotypicdata

Simpute: An Efficient Solution for Dense Genotypic Data

Ejemplares similares