Cargando…

The impact of feature selection on one and two-class classification performance for plant microRNAs

MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18–24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is...

Descripción completa

Detalles Bibliográficos
Autores principales: Khalifa, Waleed, Yousef, Malik, Saçar Demirci, Müşerref Duygu, Allmer, Jens
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4924126/
https://www.ncbi.nlm.nih.gov/pubmed/27366641
http://dx.doi.org/10.7717/peerj.2135
_version_ 1782439809829044224
author Khalifa, Waleed
Yousef, Malik
Saçar Demirci, Müşerref Duygu
Allmer, Jens
author_facet Khalifa, Waleed
Yousef, Malik
Saçar Demirci, Müşerref Duygu
Allmer, Jens
author_sort Khalifa, Waleed
collection PubMed
description MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18–24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ∼29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ∼13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.
format Online
Article
Text
id pubmed-4924126
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-49241262016-06-30 The impact of feature selection on one and two-class classification performance for plant microRNAs Khalifa, Waleed Yousef, Malik Saçar Demirci, Müşerref Duygu Allmer, Jens PeerJ Bioinformatics MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18–24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ∼29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ∼13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features. PeerJ Inc. 2016-06-21 /pmc/articles/PMC4924126/ /pubmed/27366641 http://dx.doi.org/10.7717/peerj.2135 Text en ©2016 Khalifa et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Khalifa, Waleed
Yousef, Malik
Saçar Demirci, Müşerref Duygu
Allmer, Jens
The impact of feature selection on one and two-class classification performance for plant microRNAs
title The impact of feature selection on one and two-class classification performance for plant microRNAs
title_full The impact of feature selection on one and two-class classification performance for plant microRNAs
title_fullStr The impact of feature selection on one and two-class classification performance for plant microRNAs
title_full_unstemmed The impact of feature selection on one and two-class classification performance for plant microRNAs
title_short The impact of feature selection on one and two-class classification performance for plant microRNAs
title_sort impact of feature selection on one and two-class classification performance for plant micrornas
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4924126/
https://www.ncbi.nlm.nih.gov/pubmed/27366641
http://dx.doi.org/10.7717/peerj.2135
work_keys_str_mv AT khalifawaleed theimpactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas
AT yousefmalik theimpactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas
AT sacardemircimuserrefduygu theimpactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas
AT allmerjens theimpactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas
AT khalifawaleed impactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas
AT yousefmalik impactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas
AT sacardemircimuserrefduygu impactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas
AT allmerjens impactoffeatureselectionononeandtwoclassclassificationperformanceforplantmicrornas