Cargando…

Applying compressed sensing to genome-wide association studies

BACKGROUND: The aim of a genome-wide association study (GWAS) is to isolate DNA markers for variants affecting phenotypes of interest. This is constrained by the fact that the number of markers often far exceeds the number of samples. Compressed sensing (CS) is a body of theory regarding signal reco...

Descripción completa

Detalles Bibliográficos
Autores principales: Vattikuti, Shashaank, Lee, James J, Chang, Christopher C, Hsu, Stephen D H, Chow, Carson C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4078394/
https://www.ncbi.nlm.nih.gov/pubmed/25002967
http://dx.doi.org/10.1186/2047-217X-3-10
_version_ 1782323731937361920
author Vattikuti, Shashaank
Lee, James J
Chang, Christopher C
Hsu, Stephen D H
Chow, Carson C
author_facet Vattikuti, Shashaank
Lee, James J
Chang, Christopher C
Hsu, Stephen D H
Chow, Carson C
author_sort Vattikuti, Shashaank
collection PubMed
description BACKGROUND: The aim of a genome-wide association study (GWAS) is to isolate DNA markers for variants affecting phenotypes of interest. This is constrained by the fact that the number of markers often far exceeds the number of samples. Compressed sensing (CS) is a body of theory regarding signal recovery when the number of predictor variables (i.e., genotyped markers) exceeds the sample size. Its applicability to GWAS has not been investigated. RESULTS: Using CS theory, we show that all markers with nonzero coefficients can be identified (selected) using an efficient algorithm, provided that they are sufficiently few in number (sparse) relative to sample size. For heritability equal to one (h( 2 ) = 1), there is a sharp phase transition from poor performance to complete selection as the sample size is increased. For heritability below one, complete selection still occurs, but the transition is smoothed. We find for h( 2 ) ∼ 0.5 that a sample size of approximately thirty times the number of markers with nonzero coefficients is sufficient for full selection. This boundary is only weakly dependent on the number of genotyped markers. CONCLUSION: Practical measures of signal recovery are robust to linkage disequilibrium between a true causal variant and markers residing in the same genomic region. Given a limited sample size, it is possible to discover a phase transition by increasing the penalization; in this case a subset of the support may be recovered. Applying this approach to the GWAS analysis of height, we show that 70-100% of the selected markers are strongly correlated with height-associated markers identified by the GIANT Consortium.
format Online
Article
Text
id pubmed-4078394
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40783942014-07-07 Applying compressed sensing to genome-wide association studies Vattikuti, Shashaank Lee, James J Chang, Christopher C Hsu, Stephen D H Chow, Carson C Gigascience Research BACKGROUND: The aim of a genome-wide association study (GWAS) is to isolate DNA markers for variants affecting phenotypes of interest. This is constrained by the fact that the number of markers often far exceeds the number of samples. Compressed sensing (CS) is a body of theory regarding signal recovery when the number of predictor variables (i.e., genotyped markers) exceeds the sample size. Its applicability to GWAS has not been investigated. RESULTS: Using CS theory, we show that all markers with nonzero coefficients can be identified (selected) using an efficient algorithm, provided that they are sufficiently few in number (sparse) relative to sample size. For heritability equal to one (h( 2 ) = 1), there is a sharp phase transition from poor performance to complete selection as the sample size is increased. For heritability below one, complete selection still occurs, but the transition is smoothed. We find for h( 2 ) ∼ 0.5 that a sample size of approximately thirty times the number of markers with nonzero coefficients is sufficient for full selection. This boundary is only weakly dependent on the number of genotyped markers. CONCLUSION: Practical measures of signal recovery are robust to linkage disequilibrium between a true causal variant and markers residing in the same genomic region. Given a limited sample size, it is possible to discover a phase transition by increasing the penalization; in this case a subset of the support may be recovered. Applying this approach to the GWAS analysis of height, we show that 70-100% of the selected markers are strongly correlated with height-associated markers identified by the GIANT Consortium. BioMed Central 2014-06-16 /pmc/articles/PMC4078394/ /pubmed/25002967 http://dx.doi.org/10.1186/2047-217X-3-10 Text en Copyright © 2014 Vattikuti et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Vattikuti, Shashaank
Lee, James J
Chang, Christopher C
Hsu, Stephen D H
Chow, Carson C
Applying compressed sensing to genome-wide association studies
title Applying compressed sensing to genome-wide association studies
title_full Applying compressed sensing to genome-wide association studies
title_fullStr Applying compressed sensing to genome-wide association studies
title_full_unstemmed Applying compressed sensing to genome-wide association studies
title_short Applying compressed sensing to genome-wide association studies
title_sort applying compressed sensing to genome-wide association studies
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4078394/
https://www.ncbi.nlm.nih.gov/pubmed/25002967
http://dx.doi.org/10.1186/2047-217X-3-10
work_keys_str_mv AT vattikutishashaank applyingcompressedsensingtogenomewideassociationstudies
AT leejamesj applyingcompressedsensingtogenomewideassociationstudies
AT changchristopherc applyingcompressedsensingtogenomewideassociationstudies
AT hsustephendh applyingcompressedsensingtogenomewideassociationstudies
AT chowcarsonc applyingcompressedsensingtogenomewideassociationstudies