Cargando…

ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data

Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequ...

Descripción completa

Detalles Bibliográficos
Autores principales: McKinney, Brett A., White, Bill C., Grill, Diane E., Li, Peter W., Kennedy, Richard B., Poland, Gregory A., Oberg, Ann L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3858248/
https://www.ncbi.nlm.nih.gov/pubmed/24339943
http://dx.doi.org/10.1371/journal.pone.0081527
_version_ 1782295253042069504
author McKinney, Brett A.
White, Bill C.
Grill, Diane E.
Li, Peter W.
Kennedy, Richard B.
Poland, Gregory A.
Oberg, Ann L.
author_facet McKinney, Brett A.
White, Bill C.
Grill, Diane E.
Li, Peter W.
Kennedy, Richard B.
Poland, Gregory A.
Oberg, Ann L.
author_sort McKinney, Brett A.
collection PubMed
description Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main effects and interaction effects. Software Availability: http://insilico.utulsa.edu/ReliefSeq.php.
format Online
Article
Text
id pubmed-3858248
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38582482013-12-11 ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data McKinney, Brett A. White, Bill C. Grill, Diane E. Li, Peter W. Kennedy, Richard B. Poland, Gregory A. Oberg, Ann L. PLoS One Research Article Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main effects and interaction effects. Software Availability: http://insilico.utulsa.edu/ReliefSeq.php. Public Library of Science 2013-12-10 /pmc/articles/PMC3858248/ /pubmed/24339943 http://dx.doi.org/10.1371/journal.pone.0081527 Text en © 2013 McKinney et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
McKinney, Brett A.
White, Bill C.
Grill, Diane E.
Li, Peter W.
Kennedy, Richard B.
Poland, Gregory A.
Oberg, Ann L.
ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data
title ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data
title_full ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data
title_fullStr ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data
title_full_unstemmed ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data
title_short ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data
title_sort reliefseq: a gene-wise adaptive-k nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mrna-seq gene expression data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3858248/
https://www.ncbi.nlm.nih.gov/pubmed/24339943
http://dx.doi.org/10.1371/journal.pone.0081527
work_keys_str_mv AT mckinneybretta reliefseqagenewiseadaptiveknearestneighborfeatureselectiontoolforfindinggenegeneinteractionsandmaineffectsinmrnaseqgeneexpressiondata
AT whitebillc reliefseqagenewiseadaptiveknearestneighborfeatureselectiontoolforfindinggenegeneinteractionsandmaineffectsinmrnaseqgeneexpressiondata
AT grilldianee reliefseqagenewiseadaptiveknearestneighborfeatureselectiontoolforfindinggenegeneinteractionsandmaineffectsinmrnaseqgeneexpressiondata
AT lipeterw reliefseqagenewiseadaptiveknearestneighborfeatureselectiontoolforfindinggenegeneinteractionsandmaineffectsinmrnaseqgeneexpressiondata
AT kennedyrichardb reliefseqagenewiseadaptiveknearestneighborfeatureselectiontoolforfindinggenegeneinteractionsandmaineffectsinmrnaseqgeneexpressiondata
AT polandgregorya reliefseqagenewiseadaptiveknearestneighborfeatureselectiontoolforfindinggenegeneinteractionsandmaineffectsinmrnaseqgeneexpressiondata
AT obergannl reliefseqagenewiseadaptiveknearestneighborfeatureselectiontoolforfindinggenegeneinteractionsandmaineffectsinmrnaseqgeneexpressiondata