Cargando…

Enhanced Permutation Tests via Multiple Pruning

Big multi-omics data in bioinformatics often consists of a huge number of features and relatively small numbers of samples. In addition, features from multi-omics data have their own specific characteristics depending on whether they are from genomics, proteomics, metabolomics, etc. Due to these dis...

Descripción completa

Detalles Bibliográficos
Autores principales:	Leem, Sangseob, Huh, Iksoo, Park, Taesung
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7330123/ https://www.ncbi.nlm.nih.gov/pubmed/32670346 http://dx.doi.org/10.3389/fgene.2020.00509

_version_	1783553046904569856
author	Leem, Sangseob Huh, Iksoo Park, Taesung
author_facet	Leem, Sangseob Huh, Iksoo Park, Taesung
author_sort	Leem, Sangseob
collection	PubMed
description	Big multi-omics data in bioinformatics often consists of a huge number of features and relatively small numbers of samples. In addition, features from multi-omics data have their own specific characteristics depending on whether they are from genomics, proteomics, metabolomics, etc. Due to these distinct characteristics, standard statistical analyses using parametric-based assumptions may sometimes fail to provide exact asymptotic results. To resolve this issue, permutation tests can be a way to exactly analyze multi-omics data because they are distribution-free and flexible to use. In permutation tests, p-values are evaluated by estimating the locations of test statistics in an empirical null distribution generated by random shuffling. However, the permutation approach can be infeasible when the number of features increases, because more stringent control of type I error is needed for multiple hypothesis testing, and consequently, much larger numbers of permutations are required to reach significance. To address this problem, we propose a well-organized strategy, “ENhanced Permutation tests via multiple Pruning (ENPP).” ENPP prunes the features in every permutation round if they are determined to be non-significant. In other words, if the feature statistics from the permuted datasets exceed the feature statistics from the original dataset, beyond a predetermined threshold, the feature is determined to be non-significant. If so, ENPP removes the feature and iterates the process without the feature in the next permutation round. Our simulation study showed that the ENPP method could remove about 50% of the features at the first permutation round, and, by the 100th permutation round, 98% of the features had been removed and only 7.4% of the computation time with the original unpruned permutation approach had elapsed. In addition, we applied this approach to a real data set (Korea Association REsource: KARE) of 327,872 SNPs to find association with a non-normally distributed phenotype (fasting plasma glucose), interpreted the results, and discussed the feasibility and advantages of the approach.
format	Online Article Text
id	pubmed-7330123
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-73301232020-07-14 Enhanced Permutation Tests via Multiple Pruning Leem, Sangseob Huh, Iksoo Park, Taesung Front Genet Genetics Big multi-omics data in bioinformatics often consists of a huge number of features and relatively small numbers of samples. In addition, features from multi-omics data have their own specific characteristics depending on whether they are from genomics, proteomics, metabolomics, etc. Due to these distinct characteristics, standard statistical analyses using parametric-based assumptions may sometimes fail to provide exact asymptotic results. To resolve this issue, permutation tests can be a way to exactly analyze multi-omics data because they are distribution-free and flexible to use. In permutation tests, p-values are evaluated by estimating the locations of test statistics in an empirical null distribution generated by random shuffling. However, the permutation approach can be infeasible when the number of features increases, because more stringent control of type I error is needed for multiple hypothesis testing, and consequently, much larger numbers of permutations are required to reach significance. To address this problem, we propose a well-organized strategy, “ENhanced Permutation tests via multiple Pruning (ENPP).” ENPP prunes the features in every permutation round if they are determined to be non-significant. In other words, if the feature statistics from the permuted datasets exceed the feature statistics from the original dataset, beyond a predetermined threshold, the feature is determined to be non-significant. If so, ENPP removes the feature and iterates the process without the feature in the next permutation round. Our simulation study showed that the ENPP method could remove about 50% of the features at the first permutation round, and, by the 100th permutation round, 98% of the features had been removed and only 7.4% of the computation time with the original unpruned permutation approach had elapsed. In addition, we applied this approach to a real data set (Korea Association REsource: KARE) of 327,872 SNPs to find association with a non-normally distributed phenotype (fasting plasma glucose), interpreted the results, and discussed the feasibility and advantages of the approach. Frontiers Media S.A. 2020-06-25 /pmc/articles/PMC7330123/ /pubmed/32670346 http://dx.doi.org/10.3389/fgene.2020.00509 Text en Copyright © 2020 Leem, Huh and Park. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Leem, Sangseob Huh, Iksoo Park, Taesung Enhanced Permutation Tests via Multiple Pruning
title	Enhanced Permutation Tests via Multiple Pruning
title_full	Enhanced Permutation Tests via Multiple Pruning
title_fullStr	Enhanced Permutation Tests via Multiple Pruning
title_full_unstemmed	Enhanced Permutation Tests via Multiple Pruning
title_short	Enhanced Permutation Tests via Multiple Pruning
title_sort	enhanced permutation tests via multiple pruning
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7330123/ https://www.ncbi.nlm.nih.gov/pubmed/32670346 http://dx.doi.org/10.3389/fgene.2020.00509
work_keys_str_mv	AT leemsangseob enhancedpermutationtestsviamultiplepruning AT huhiksoo enhancedpermutationtestsviamultiplepruning AT parktaesung enhancedpermutationtestsviamultiplepruning

Enhanced Permutation Tests via Multiple Pruning

Ejemplares similares