Cargando…

The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data

Ranking feature sets for phenotype classification based on gene expression is a challenging issue in cancer bioinformatics. When the number of samples is small, all feature selection algorithms are known to be unreliable, producing significant error, and error estimators suffer from different degree...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Eunji, Ivanov, Ivan, Hua, Jianping, Lampe, Johanna W, Hullar, Meredith AJ, Chapkin, Robert S, Dougherty, Edward R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5470876/
https://www.ncbi.nlm.nih.gov/pubmed/28659712
http://dx.doi.org/10.1177/1176935117710530
_version_ 1783243840119898112
author Kim, Eunji
Ivanov, Ivan
Hua, Jianping
Lampe, Johanna W
Hullar, Meredith AJ
Chapkin, Robert S
Dougherty, Edward R
author_facet Kim, Eunji
Ivanov, Ivan
Hua, Jianping
Lampe, Johanna W
Hullar, Meredith AJ
Chapkin, Robert S
Dougherty, Edward R
author_sort Kim, Eunji
collection PubMed
description Ranking feature sets for phenotype classification based on gene expression is a challenging issue in cancer bioinformatics. When the number of samples is small, all feature selection algorithms are known to be unreliable, producing significant error, and error estimators suffer from different degrees of imprecision. The problem is compounded by the fact that the accuracy of classification depends on the manner in which the phenomena are transformed into data by the measurement technology. Because next-generation sequencing technologies amount to a nonlinear transformation of the actual gene or RNA concentrations, they can potentially produce less discriminative data relative to the actual gene expression levels. In this study, we compare the performance of ranking feature sets derived from a model of RNA-Seq data with that of a multivariate normal model of gene concentrations using 3 measures: (1) ranking power, (2) length of extensions, and (3) Bayes features. This is the model-based study to examine the effectiveness of reporting lists of small feature sets using RNA-Seq data and the effects of different model parameters and error estimators. The results demonstrate that the general trends of the parameter effects on the ranking power of the underlying gene concentrations are preserved in the RNA-Seq data, whereas the power of finding a good feature set becomes weaker when gene concentrations are transformed by the sequencing machine.
format Online
Article
Text
id pubmed-5470876
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-54708762017-06-28 The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data Kim, Eunji Ivanov, Ivan Hua, Jianping Lampe, Johanna W Hullar, Meredith AJ Chapkin, Robert S Dougherty, Edward R Cancer Inform Methodology Ranking feature sets for phenotype classification based on gene expression is a challenging issue in cancer bioinformatics. When the number of samples is small, all feature selection algorithms are known to be unreliable, producing significant error, and error estimators suffer from different degrees of imprecision. The problem is compounded by the fact that the accuracy of classification depends on the manner in which the phenomena are transformed into data by the measurement technology. Because next-generation sequencing technologies amount to a nonlinear transformation of the actual gene or RNA concentrations, they can potentially produce less discriminative data relative to the actual gene expression levels. In this study, we compare the performance of ranking feature sets derived from a model of RNA-Seq data with that of a multivariate normal model of gene concentrations using 3 measures: (1) ranking power, (2) length of extensions, and (3) Bayes features. This is the model-based study to examine the effectiveness of reporting lists of small feature sets using RNA-Seq data and the effects of different model parameters and error estimators. The results demonstrate that the general trends of the parameter effects on the ranking power of the underlying gene concentrations are preserved in the RNA-Seq data, whereas the power of finding a good feature set becomes weaker when gene concentrations are transformed by the sequencing machine. SAGE Publications 2017-06-12 /pmc/articles/PMC5470876/ /pubmed/28659712 http://dx.doi.org/10.1177/1176935117710530 Text en © The Author(s) 2017 This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page(https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Methodology
Kim, Eunji
Ivanov, Ivan
Hua, Jianping
Lampe, Johanna W
Hullar, Meredith AJ
Chapkin, Robert S
Dougherty, Edward R
The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data
title The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data
title_full The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data
title_fullStr The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data
title_full_unstemmed The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data
title_short The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data
title_sort model-based study of the effectiveness of reporting lists of small feature sets using rna-seq data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5470876/
https://www.ncbi.nlm.nih.gov/pubmed/28659712
http://dx.doi.org/10.1177/1176935117710530
work_keys_str_mv AT kimeunji themodelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata
AT ivanovivan themodelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata
AT huajianping themodelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata
AT lampejohannaw themodelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata
AT hullarmeredithaj themodelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata
AT chapkinroberts themodelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata
AT doughertyedwardr themodelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata
AT kimeunji modelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata
AT ivanovivan modelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata
AT huajianping modelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata
AT lampejohannaw modelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata
AT hullarmeredithaj modelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata
AT chapkinroberts modelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata
AT doughertyedwardr modelbasedstudyoftheeffectivenessofreportinglistsofsmallfeaturesetsusingrnaseqdata