Cargando…
Enhanced Permutation Tests via Multiple Pruning
Big multi-omics data in bioinformatics often consists of a huge number of features and relatively small numbers of samples. In addition, features from multi-omics data have their own specific characteristics depending on whether they are from genomics, proteomics, metabolomics, etc. Due to these dis...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7330123/ https://www.ncbi.nlm.nih.gov/pubmed/32670346 http://dx.doi.org/10.3389/fgene.2020.00509 |
_version_ | 1783553046904569856 |
---|---|
author | Leem, Sangseob Huh, Iksoo Park, Taesung |
author_facet | Leem, Sangseob Huh, Iksoo Park, Taesung |
author_sort | Leem, Sangseob |
collection | PubMed |
description | Big multi-omics data in bioinformatics often consists of a huge number of features and relatively small numbers of samples. In addition, features from multi-omics data have their own specific characteristics depending on whether they are from genomics, proteomics, metabolomics, etc. Due to these distinct characteristics, standard statistical analyses using parametric-based assumptions may sometimes fail to provide exact asymptotic results. To resolve this issue, permutation tests can be a way to exactly analyze multi-omics data because they are distribution-free and flexible to use. In permutation tests, p-values are evaluated by estimating the locations of test statistics in an empirical null distribution generated by random shuffling. However, the permutation approach can be infeasible when the number of features increases, because more stringent control of type I error is needed for multiple hypothesis testing, and consequently, much larger numbers of permutations are required to reach significance. To address this problem, we propose a well-organized strategy, “ENhanced Permutation tests via multiple Pruning (ENPP).” ENPP prunes the features in every permutation round if they are determined to be non-significant. In other words, if the feature statistics from the permuted datasets exceed the feature statistics from the original dataset, beyond a predetermined threshold, the feature is determined to be non-significant. If so, ENPP removes the feature and iterates the process without the feature in the next permutation round. Our simulation study showed that the ENPP method could remove about 50% of the features at the first permutation round, and, by the 100th permutation round, 98% of the features had been removed and only 7.4% of the computation time with the original unpruned permutation approach had elapsed. In addition, we applied this approach to a real data set (Korea Association REsource: KARE) of 327,872 SNPs to find association with a non-normally distributed phenotype (fasting plasma glucose), interpreted the results, and discussed the feasibility and advantages of the approach. |
format | Online Article Text |
id | pubmed-7330123 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-73301232020-07-14 Enhanced Permutation Tests via Multiple Pruning Leem, Sangseob Huh, Iksoo Park, Taesung Front Genet Genetics Big multi-omics data in bioinformatics often consists of a huge number of features and relatively small numbers of samples. In addition, features from multi-omics data have their own specific characteristics depending on whether they are from genomics, proteomics, metabolomics, etc. Due to these distinct characteristics, standard statistical analyses using parametric-based assumptions may sometimes fail to provide exact asymptotic results. To resolve this issue, permutation tests can be a way to exactly analyze multi-omics data because they are distribution-free and flexible to use. In permutation tests, p-values are evaluated by estimating the locations of test statistics in an empirical null distribution generated by random shuffling. However, the permutation approach can be infeasible when the number of features increases, because more stringent control of type I error is needed for multiple hypothesis testing, and consequently, much larger numbers of permutations are required to reach significance. To address this problem, we propose a well-organized strategy, “ENhanced Permutation tests via multiple Pruning (ENPP).” ENPP prunes the features in every permutation round if they are determined to be non-significant. In other words, if the feature statistics from the permuted datasets exceed the feature statistics from the original dataset, beyond a predetermined threshold, the feature is determined to be non-significant. If so, ENPP removes the feature and iterates the process without the feature in the next permutation round. Our simulation study showed that the ENPP method could remove about 50% of the features at the first permutation round, and, by the 100th permutation round, 98% of the features had been removed and only 7.4% of the computation time with the original unpruned permutation approach had elapsed. In addition, we applied this approach to a real data set (Korea Association REsource: KARE) of 327,872 SNPs to find association with a non-normally distributed phenotype (fasting plasma glucose), interpreted the results, and discussed the feasibility and advantages of the approach. Frontiers Media S.A. 2020-06-25 /pmc/articles/PMC7330123/ /pubmed/32670346 http://dx.doi.org/10.3389/fgene.2020.00509 Text en Copyright © 2020 Leem, Huh and Park. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Leem, Sangseob Huh, Iksoo Park, Taesung Enhanced Permutation Tests via Multiple Pruning |
title | Enhanced Permutation Tests via Multiple Pruning |
title_full | Enhanced Permutation Tests via Multiple Pruning |
title_fullStr | Enhanced Permutation Tests via Multiple Pruning |
title_full_unstemmed | Enhanced Permutation Tests via Multiple Pruning |
title_short | Enhanced Permutation Tests via Multiple Pruning |
title_sort | enhanced permutation tests via multiple pruning |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7330123/ https://www.ncbi.nlm.nih.gov/pubmed/32670346 http://dx.doi.org/10.3389/fgene.2020.00509 |
work_keys_str_mv | AT leemsangseob enhancedpermutationtestsviamultiplepruning AT huhiksoo enhancedpermutationtestsviamultiplepruning AT parktaesung enhancedpermutationtestsviamultiplepruning |