Cargando…
Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data
In genetic association studies with high-dimensional genomic data, multiple group testing procedures are often required in order to identify disease/trait-related genes or genetic regions, where multiple genetic sites or variants are located within the same gene or genetic region. However, statistic...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Korea Genome Organization
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5287123/ https://www.ncbi.nlm.nih.gov/pubmed/28154510 http://dx.doi.org/10.5808/GI.2016.14.4.187 |
_version_ | 1782504109617709056 |
---|---|
author | Ko, Hyoseok Kim, Kipoong Sun, Hokeun |
author_facet | Ko, Hyoseok Kim, Kipoong Sun, Hokeun |
author_sort | Ko, Hyoseok |
collection | PubMed |
description | In genetic association studies with high-dimensional genomic data, multiple group testing procedures are often required in order to identify disease/trait-related genes or genetic regions, where multiple genetic sites or variants are located within the same gene or genetic region. However, statistical testing procedures based on an individual test suffer from multiple testing issues such as the control of family-wise error rate and dependent tests. Moreover, detecting only a few of genes associated with a phenotype outcome among tens of thousands of genes is of main interest in genetic association studies. In this reason regularization procedures, where a phenotype outcome regresses on all genomic markers and then regression coefficients are estimated based on a penalized likelihood, have been considered as a good alternative approach to analysis of high-dimensional genomic data. But, selection performance of regularization procedures has been rarely compared with that of statistical group testing procedures. In this article, we performed extensive simulation studies where commonly used group testing procedures such as principal component analysis, Hotelling's T(2) test, and permutation test are compared with group lasso (least absolute selection and shrinkage operator) in terms of true positive selection. Also, we applied all methods considered in simulation studies to identify genes associated with ovarian cancer from over 20,000 genetic sites generated from Illumina Infinium HumanMethylation27K Beadchip. We found a big discrepancy of selected genes between multiple group testing procedures and group lasso. |
format | Online Article Text |
id | pubmed-5287123 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Korea Genome Organization |
record_format | MEDLINE/PubMed |
spelling | pubmed-52871232017-02-02 Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data Ko, Hyoseok Kim, Kipoong Sun, Hokeun Genomics Inform Original Article In genetic association studies with high-dimensional genomic data, multiple group testing procedures are often required in order to identify disease/trait-related genes or genetic regions, where multiple genetic sites or variants are located within the same gene or genetic region. However, statistical testing procedures based on an individual test suffer from multiple testing issues such as the control of family-wise error rate and dependent tests. Moreover, detecting only a few of genes associated with a phenotype outcome among tens of thousands of genes is of main interest in genetic association studies. In this reason regularization procedures, where a phenotype outcome regresses on all genomic markers and then regression coefficients are estimated based on a penalized likelihood, have been considered as a good alternative approach to analysis of high-dimensional genomic data. But, selection performance of regularization procedures has been rarely compared with that of statistical group testing procedures. In this article, we performed extensive simulation studies where commonly used group testing procedures such as principal component analysis, Hotelling's T(2) test, and permutation test are compared with group lasso (least absolute selection and shrinkage operator) in terms of true positive selection. Also, we applied all methods considered in simulation studies to identify genes associated with ovarian cancer from over 20,000 genetic sites generated from Illumina Infinium HumanMethylation27K Beadchip. We found a big discrepancy of selected genes between multiple group testing procedures and group lasso. Korea Genome Organization 2016-12 2016-12-30 /pmc/articles/PMC5287123/ /pubmed/28154510 http://dx.doi.org/10.5808/GI.2016.14.4.187 Text en Copyright © 2016 by the Korea Genome Organization http://creativecommons.org/licenses/by-nc/4.0/ It is identical to the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/). |
spellingShingle | Original Article Ko, Hyoseok Kim, Kipoong Sun, Hokeun Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data |
title | Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data |
title_full | Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data |
title_fullStr | Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data |
title_full_unstemmed | Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data |
title_short | Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data |
title_sort | multiple group testing procedures for analysis of high-dimensional genomic data |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5287123/ https://www.ncbi.nlm.nih.gov/pubmed/28154510 http://dx.doi.org/10.5808/GI.2016.14.4.187 |
work_keys_str_mv | AT kohyoseok multiplegrouptestingproceduresforanalysisofhighdimensionalgenomicdata AT kimkipoong multiplegrouptestingproceduresforanalysisofhighdimensionalgenomicdata AT sunhokeun multiplegrouptestingproceduresforanalysisofhighdimensionalgenomicdata |