Cargando…
A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data
Sequencing is widely used to discover associations between microRNAs (miRNAs) and diseases. However, the negative binomial distribution (NB) and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have b...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi Publishing Corporation
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4609795/ https://www.ncbi.nlm.nih.gov/pubmed/26508990 http://dx.doi.org/10.1155/2015/178572 |
_version_ | 1782395848303312896 |
---|---|
author | Yang, Sheng Guo, Li Shao, Fang Zhao, Yang Chen, Feng |
author_facet | Yang, Sheng Guo, Li Shao, Fang Zhao, Yang Chen, Feng |
author_sort | Yang, Sheng |
collection | PubMed |
description | Sequencing is widely used to discover associations between microRNAs (miRNAs) and diseases. However, the negative binomial distribution (NB) and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have been proposed to address sequencing data, and although evaluation of these methods is essential, such studies are relatively rare. The performance of seven feature selection (FS) algorithms, including baySeq, DESeq, edgeR, the rank sum test, lasso, particle swarm optimistic decision tree, and random forest (RF), was compared by simulation under different conditions based on the difference of the mean, the dispersion parameter of the NB, and the signal to noise ratio. Real data were used to evaluate the performance of RF, logistic regression, and support vector machine. Based on the simulation and real data, we discuss the behaviour of the FS and classification algorithms. The Apriori algorithm identified frequent item sets (mir-133a, mir-133b, mir-183, mir-937, and mir-96) from among the deregulated miRNAs of six datasets from The Cancer Genomics Atlas. Taking these findings altogether and considering computational memory requirements, we propose a strategy that combines edgeR and DESeq for large sample sizes. |
format | Online Article Text |
id | pubmed-4609795 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Hindawi Publishing Corporation |
record_format | MEDLINE/PubMed |
spelling | pubmed-46097952015-10-27 A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data Yang, Sheng Guo, Li Shao, Fang Zhao, Yang Chen, Feng Comput Math Methods Med Research Article Sequencing is widely used to discover associations between microRNAs (miRNAs) and diseases. However, the negative binomial distribution (NB) and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have been proposed to address sequencing data, and although evaluation of these methods is essential, such studies are relatively rare. The performance of seven feature selection (FS) algorithms, including baySeq, DESeq, edgeR, the rank sum test, lasso, particle swarm optimistic decision tree, and random forest (RF), was compared by simulation under different conditions based on the difference of the mean, the dispersion parameter of the NB, and the signal to noise ratio. Real data were used to evaluate the performance of RF, logistic regression, and support vector machine. Based on the simulation and real data, we discuss the behaviour of the FS and classification algorithms. The Apriori algorithm identified frequent item sets (mir-133a, mir-133b, mir-183, mir-937, and mir-96) from among the deregulated miRNAs of six datasets from The Cancer Genomics Atlas. Taking these findings altogether and considering computational memory requirements, we propose a strategy that combines edgeR and DESeq for large sample sizes. Hindawi Publishing Corporation 2015 2015-10-05 /pmc/articles/PMC4609795/ /pubmed/26508990 http://dx.doi.org/10.1155/2015/178572 Text en Copyright © 2015 Sheng Yang et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Yang, Sheng Guo, Li Shao, Fang Zhao, Yang Chen, Feng A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data |
title | A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data |
title_full | A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data |
title_fullStr | A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data |
title_full_unstemmed | A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data |
title_short | A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data |
title_sort | systematic evaluation of feature selection and classification algorithms using simulated and real mirna sequencing data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4609795/ https://www.ncbi.nlm.nih.gov/pubmed/26508990 http://dx.doi.org/10.1155/2015/178572 |
work_keys_str_mv | AT yangsheng asystematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata AT guoli asystematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata AT shaofang asystematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata AT zhaoyang asystematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata AT chenfeng asystematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata AT yangsheng systematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata AT guoli systematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata AT shaofang systematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata AT zhaoyang systematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata AT chenfeng systematicevaluationoffeatureselectionandclassificationalgorithmsusingsimulatedandrealmirnasequencingdata |