Cargando…
The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data
Ranking feature sets for phenotype classification based on gene expression is a challenging issue in cancer bioinformatics. When the number of samples is small, all feature selection algorithms are known to be unreliable, producing significant error, and error estimators suffer from different degree...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
SAGE Publications
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5470876/ https://www.ncbi.nlm.nih.gov/pubmed/28659712 http://dx.doi.org/10.1177/1176935117710530 |
_version_ | 1783243840119898112 |
---|---|
author | Kim, Eunji Ivanov, Ivan Hua, Jianping Lampe, Johanna W Hullar, Meredith AJ Chapkin, Robert S Dougherty, Edward R |
author_facet | Kim, Eunji Ivanov, Ivan Hua, Jianping Lampe, Johanna W Hullar, Meredith AJ Chapkin, Robert S Dougherty, Edward R |
author_sort | Kim, Eunji |
collection | PubMed |
description | Ranking feature sets for phenotype classification based on gene expression is a challenging issue in cancer bioinformatics. When the number of samples is small, all feature selection algorithms are known to be unreliable, producing significant error, and error estimators suffer from different degrees of imprecision. The problem is compounded by the fact that the accuracy of classification depends on the manner in which the phenomena are transformed into data by the measurement technology. Because next-generation sequencing technologies amount to a nonlinear transformation of the actual gene or RNA concentrations, they can potentially produce less discriminative data relative to the actual gene expression levels. In this study, we compare the performance of ranking feature sets derived from a model of RNA-Seq data with that of a multivariate normal model of gene concentrations using 3 measures: (1) ranking power, (2) length of extensions, and (3) Bayes features. This is the model-based study to examine the effectiveness of reporting lists of small feature sets using RNA-Seq data and the effects of different model parameters and error estimators. The results demonstrate that the general trends of the parameter effects on the ranking power of the underlying gene concentrations are preserved in the RNA-Seq data, whereas the power of finding a good feature set becomes weaker when gene concentrations are transformed by the sequencing machine. |
format | Online Article Text |
id | pubmed-5470876 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | SAGE Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-54708762017-06-28 The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data Kim, Eunji Ivanov, Ivan Hua, Jianping Lampe, Johanna W Hullar, Meredith AJ Chapkin, Robert S Dougherty, Edward R Cancer Inform Methodology Ranking feature sets for phenotype classification based on gene expression is a challenging issue in cancer bioinformatics. When the number of samples is small, all feature selection algorithms are known to be unreliable, producing significant error, and error estimators suffer from different degrees of imprecision. The problem is compounded by the fact that the accuracy of classification depends on the manner in which the phenomena are transformed into data by the measurement technology. Because next-generation sequencing technologies amount to a nonlinear transformation of the actual gene or RNA concentrations, they can potentially produce less discriminative data relative to the actual gene expression levels. In this study, we compare the performance of ranking feature sets derived from a model of RNA-Seq data with that of a multivariate normal model of gene concentrations using 3 measures: (1) ranking power, (2) length of extensions, and (3) Bayes features. This is the model-based study to examine the effectiveness of reporting lists of small feature sets using RNA-Seq data and the effects of different model parameters and error estimators. The results demonstrate that the general trends of the parameter effects on the ranking power of the underlying gene concentrations are preserved in the RNA-Seq data, whereas the power of finding a good feature set becomes weaker when gene concentrations are transformed by the sequencing machine. SAGE Publications 2017-06-12 /pmc/articles/PMC5470876/ /pubmed/28659712 http://dx.doi.org/10.1177/1176935117710530 Text en © The Author(s) 2017 This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page(https://us.sagepub.com/en-us/nam/open-access-at-sage). |
spellingShingle | Methodology Kim, Eunji Ivanov, Ivan Hua, Jianping Lampe, Johanna W Hullar, Meredith AJ Chapkin, Robert S Dougherty, Edward R The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data |
title | The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data |
title_full | The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data |
title_fullStr | The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data |
title_full_unstemmed | The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data |
title_short | The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data |
title_sort | model-based study of the effectiveness of reporting lists of small feature sets using rna-seq data |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5470876/ https://www.ncbi.nlm.nih.gov/pubmed/28659712 http://dx.doi.org/10.1177/1176935117710530 |
work_keys_str_mv | AT kimeunji themodelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata AT ivanovivan themodelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata AT huajianping themodelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata AT lampejohannaw themodelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata AT hullarmeredithaj themodelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata AT chapkinroberts themodelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata AT doughertyedwardr themodelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata AT kimeunji modelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata AT ivanovivan modelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata AT huajianping modelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata AT lampejohannaw modelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata AT hullarmeredithaj modelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata AT chapkinroberts modelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata AT doughertyedwardr modelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata |