Cargando…

Enhanced Permutation Tests via Multiple Pruning

Big multi-omics data in bioinformatics often consists of a huge number of features and relatively small numbers of samples. In addition, features from multi-omics data have their own specific characteristics depending on whether they are from genomics, proteomics, metabolomics, etc. Due to these dis...

Descripción completa

Detalles Bibliográficos
Autores principales: Leem, Sangseob, Huh, Iksoo, Park, Taesung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7330123/
https://www.ncbi.nlm.nih.gov/pubmed/32670346
http://dx.doi.org/10.3389/fgene.2020.00509
_version_ 1783553046904569856
author Leem, Sangseob
Huh, Iksoo
Park, Taesung
author_facet Leem, Sangseob
Huh, Iksoo
Park, Taesung
author_sort Leem, Sangseob
collection PubMed
description Big multi-omics data in bioinformatics often consists of a huge number of features and relatively small numbers of samples. In addition, features from multi-omics data have their own specific characteristics depending on whether they are from genomics, proteomics, metabolomics, etc. Due to these distinct characteristics, standard statistical analyses using parametric-based assumptions may sometimes fail to provide exact asymptotic results. To resolve this issue, permutation tests can be a way to exactly analyze multi-omics data because they are distribution-free and flexible to use. In permutation tests, p-values are evaluated by estimating the locations of test statistics in an empirical null distribution generated by random shuffling. However, the permutation approach can be infeasible when the number of features increases, because more stringent control of type I error is needed for multiple hypothesis testing, and consequently, much larger numbers of permutations are required to reach significance. To address this problem, we propose a well-organized strategy, “ENhanced Permutation tests via multiple Pruning (ENPP).” ENPP prunes the features in every permutation round if they are determined to be non-significant. In other words, if the feature statistics from the permuted datasets exceed the feature statistics from the original dataset, beyond a predetermined threshold, the feature is determined to be non-significant. If so, ENPP removes the feature and iterates the process without the feature in the next permutation round. Our simulation study showed that the ENPP method could remove about 50% of the features at the first permutation round, and, by the 100th permutation round, 98% of the features had been removed and only 7.4% of the computation time with the original unpruned permutation approach had elapsed. In addition, we applied this approach to a real data set (Korea Association REsource: KARE) of 327,872 SNPs to find association with a non-normally distributed phenotype (fasting plasma glucose), interpreted the results, and discussed the feasibility and advantages of the approach.
format Online
Article
Text
id pubmed-7330123
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-73301232020-07-14 Enhanced Permutation Tests via Multiple Pruning Leem, Sangseob Huh, Iksoo Park, Taesung Front Genet Genetics Big multi-omics data in bioinformatics often consists of a huge number of features and relatively small numbers of samples. In addition, features from multi-omics data have their own specific characteristics depending on whether they are from genomics, proteomics, metabolomics, etc. Due to these distinct characteristics, standard statistical analyses using parametric-based assumptions may sometimes fail to provide exact asymptotic results. To resolve this issue, permutation tests can be a way to exactly analyze multi-omics data because they are distribution-free and flexible to use. In permutation tests, p-values are evaluated by estimating the locations of test statistics in an empirical null distribution generated by random shuffling. However, the permutation approach can be infeasible when the number of features increases, because more stringent control of type I error is needed for multiple hypothesis testing, and consequently, much larger numbers of permutations are required to reach significance. To address this problem, we propose a well-organized strategy, “ENhanced Permutation tests via multiple Pruning (ENPP).” ENPP prunes the features in every permutation round if they are determined to be non-significant. In other words, if the feature statistics from the permuted datasets exceed the feature statistics from the original dataset, beyond a predetermined threshold, the feature is determined to be non-significant. If so, ENPP removes the feature and iterates the process without the feature in the next permutation round. Our simulation study showed that the ENPP method could remove about 50% of the features at the first permutation round, and, by the 100th permutation round, 98% of the features had been removed and only 7.4% of the computation time with the original unpruned permutation approach had elapsed. In addition, we applied this approach to a real data set (Korea Association REsource: KARE) of 327,872 SNPs to find association with a non-normally distributed phenotype (fasting plasma glucose), interpreted the results, and discussed the feasibility and advantages of the approach. Frontiers Media S.A. 2020-06-25 /pmc/articles/PMC7330123/ /pubmed/32670346 http://dx.doi.org/10.3389/fgene.2020.00509 Text en Copyright © 2020 Leem, Huh and Park. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Leem, Sangseob
Huh, Iksoo
Park, Taesung
Enhanced Permutation Tests via Multiple Pruning
title Enhanced Permutation Tests via Multiple Pruning
title_full Enhanced Permutation Tests via Multiple Pruning
title_fullStr Enhanced Permutation Tests via Multiple Pruning
title_full_unstemmed Enhanced Permutation Tests via Multiple Pruning
title_short Enhanced Permutation Tests via Multiple Pruning
title_sort enhanced permutation tests via multiple pruning
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7330123/
https://www.ncbi.nlm.nih.gov/pubmed/32670346
http://dx.doi.org/10.3389/fgene.2020.00509
work_keys_str_mv AT leemsangseob enhancedpermutationtestsviamultiplepruning
AT huhiksoo enhancedpermutationtestsviamultiplepruning
AT parktaesung enhancedpermutationtestsviamultiplepruning