Cargando…

A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data

Sequencing is widely used to discover associations between microRNAs (miRNAs) and diseases. However, the negative binomial distribution (NB) and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have b...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Sheng, Guo, Li, Shao, Fang, Zhao, Yang, Chen, Feng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4609795/
https://www.ncbi.nlm.nih.gov/pubmed/26508990
http://dx.doi.org/10.1155/2015/178572
_version_ 1782395848303312896
author Yang, Sheng
Guo, Li
Shao, Fang
Zhao, Yang
Chen, Feng
author_facet Yang, Sheng
Guo, Li
Shao, Fang
Zhao, Yang
Chen, Feng
author_sort Yang, Sheng
collection PubMed
description Sequencing is widely used to discover associations between microRNAs (miRNAs) and diseases. However, the negative binomial distribution (NB) and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have been proposed to address sequencing data, and although evaluation of these methods is essential, such studies are relatively rare. The performance of seven feature selection (FS) algorithms, including baySeq, DESeq, edgeR, the rank sum test, lasso, particle swarm optimistic decision tree, and random forest (RF), was compared by simulation under different conditions based on the difference of the mean, the dispersion parameter of the NB, and the signal to noise ratio. Real data were used to evaluate the performance of RF, logistic regression, and support vector machine. Based on the simulation and real data, we discuss the behaviour of the FS and classification algorithms. The Apriori algorithm identified frequent item sets (mir-133a, mir-133b, mir-183, mir-937, and mir-96) from among the deregulated miRNAs of six datasets from The Cancer Genomics Atlas. Taking these findings altogether and considering computational memory requirements, we propose a strategy that combines edgeR and DESeq for large sample sizes.
format Online
Article
Text
id pubmed-4609795
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-46097952015-10-27 A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data Yang, Sheng Guo, Li Shao, Fang Zhao, Yang Chen, Feng Comput Math Methods Med Research Article Sequencing is widely used to discover associations between microRNAs (miRNAs) and diseases. However, the negative binomial distribution (NB) and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have been proposed to address sequencing data, and although evaluation of these methods is essential, such studies are relatively rare. The performance of seven feature selection (FS) algorithms, including baySeq, DESeq, edgeR, the rank sum test, lasso, particle swarm optimistic decision tree, and random forest (RF), was compared by simulation under different conditions based on the difference of the mean, the dispersion parameter of the NB, and the signal to noise ratio. Real data were used to evaluate the performance of RF, logistic regression, and support vector machine. Based on the simulation and real data, we discuss the behaviour of the FS and classification algorithms. The Apriori algorithm identified frequent item sets (mir-133a, mir-133b, mir-183, mir-937, and mir-96) from among the deregulated miRNAs of six datasets from The Cancer Genomics Atlas. Taking these findings altogether and considering computational memory requirements, we propose a strategy that combines edgeR and DESeq for large sample sizes. Hindawi Publishing Corporation 2015 2015-10-05 /pmc/articles/PMC4609795/ /pubmed/26508990 http://dx.doi.org/10.1155/2015/178572 Text en Copyright © 2015 Sheng Yang et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Yang, Sheng
Guo, Li
Shao, Fang
Zhao, Yang
Chen, Feng
A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data
title A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data
title_full A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data
title_fullStr A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data
title_full_unstemmed A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data
title_short A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data
title_sort systematic evaluation of feature selection and classification algorithms using simulated and real mirna sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4609795/
https://www.ncbi.nlm.nih.gov/pubmed/26508990
http://dx.doi.org/10.1155/2015/178572
work_keys_str_mv AT yangsheng asystematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata
AT guoli asystematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata
AT shaofang asystematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata
AT zhaoyang asystematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata
AT chenfeng asystematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata
AT yangsheng systematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata
AT guoli systematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata
AT shaofang systematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata
AT zhaoyang systematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata
AT chenfeng systematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata