Cargando…

SparRec: An effective matrix completion framework of missing data imputation for GWAS

Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Bo, Ma, Shiqian, Causey, Jason, Qiao, Linbo, Hardin, Matthew Price, Bitts, Ian, Johnson, Daniel, Zhang, Shuzhong, Huang, Xiuzhen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5071878/
https://www.ncbi.nlm.nih.gov/pubmed/27762341
http://dx.doi.org/10.1038/srep35534
_version_ 1782461344036945920
author Jiang, Bo
Ma, Shiqian
Causey, Jason
Qiao, Linbo
Hardin, Matthew Price
Bitts, Ian
Johnson, Daniel
Zhang, Shuzhong
Huang, Xiuzhen
author_facet Jiang, Bo
Ma, Shiqian
Causey, Jason
Qiao, Linbo
Hardin, Matthew Price
Bitts, Ian
Johnson, Daniel
Zhang, Shuzhong
Huang, Xiuzhen
author_sort Jiang, Bo
collection PubMed
description Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.
format Online
Article
Text
id pubmed-5071878
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-50718782016-10-26 SparRec: An effective matrix completion framework of missing data imputation for GWAS Jiang, Bo Ma, Shiqian Causey, Jason Qiao, Linbo Hardin, Matthew Price Bitts, Ian Johnson, Daniel Zhang, Shuzhong Huang, Xiuzhen Sci Rep Article Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase. Nature Publishing Group 2016-10-20 /pmc/articles/PMC5071878/ /pubmed/27762341 http://dx.doi.org/10.1038/srep35534 Text en Copyright © 2016, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Jiang, Bo
Ma, Shiqian
Causey, Jason
Qiao, Linbo
Hardin, Matthew Price
Bitts, Ian
Johnson, Daniel
Zhang, Shuzhong
Huang, Xiuzhen
SparRec: An effective matrix completion framework of missing data imputation for GWAS
title SparRec: An effective matrix completion framework of missing data imputation for GWAS
title_full SparRec: An effective matrix completion framework of missing data imputation for GWAS
title_fullStr SparRec: An effective matrix completion framework of missing data imputation for GWAS
title_full_unstemmed SparRec: An effective matrix completion framework of missing data imputation for GWAS
title_short SparRec: An effective matrix completion framework of missing data imputation for GWAS
title_sort sparrec: an effective matrix completion framework of missing data imputation for gwas
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5071878/
https://www.ncbi.nlm.nih.gov/pubmed/27762341
http://dx.doi.org/10.1038/srep35534
work_keys_str_mv AT jiangbo sparrecaneffectivematrixcompletionframeworkofmissingdataimputationforgwas
AT mashiqian sparrecaneffectivematrixcompletionframeworkofmissingdataimputationforgwas
AT causeyjason sparrecaneffectivematrixcompletionframeworkofmissingdataimputationforgwas
AT qiaolinbo sparrecaneffectivematrixcompletionframeworkofmissingdataimputationforgwas
AT hardinmatthewprice sparrecaneffectivematrixcompletionframeworkofmissingdataimputationforgwas
AT bittsian sparrecaneffectivematrixcompletionframeworkofmissingdataimputationforgwas
AT johnsondaniel sparrecaneffectivematrixcompletionframeworkofmissingdataimputationforgwas
AT zhangshuzhong sparrecaneffectivematrixcompletionframeworkofmissingdataimputationforgwas
AT huangxiuzhen sparrecaneffectivematrixcompletionframeworkofmissingdataimputationforgwas