Cargando…
miRBoost: boosting support vector machines for microRNA precursor classification
Identification of microRNAs (miRNAs) is an important step toward understanding post-transcriptional gene regulation and miRNA-related pathology. Difficulties in identifying miRNAs through experimental techniques combined with the huge amount of data from new sequencing technologies have made in sili...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4408786/ https://www.ncbi.nlm.nih.gov/pubmed/25795417 http://dx.doi.org/10.1261/rna.043612.113 |
_version_ | 1782368106154295296 |
---|---|
author | Tran, Van Du T. Tempel, Sebastien Zerath, Benjamin Zehraoui, Farida Tahi, Fariza |
author_facet | Tran, Van Du T. Tempel, Sebastien Zerath, Benjamin Zehraoui, Farida Tahi, Fariza |
author_sort | Tran, Van Du T. |
collection | PubMed |
description | Identification of microRNAs (miRNAs) is an important step toward understanding post-transcriptional gene regulation and miRNA-related pathology. Difficulties in identifying miRNAs through experimental techniques combined with the huge amount of data from new sequencing technologies have made in silico discrimination of bona fide miRNA precursors from non-miRNA hairpin-like structures an important topic in bioinformatics. Among various techniques developed for this classification problem, machine learning approaches have proved to be the most promising. However these approaches require the use of training data, which is problematic due to an imbalance in the number of miRNAs (positive data) and non-miRNAs (negative data), which leads to a degradation of their performance. In order to address this issue, we present an ensemble method that uses a boosting technique with support vector machine components to deal with imbalanced training data. Classification is performed following a feature selection on 187 novel and existing features. The algorithm, miRBoost, performed better in comparison with state-of-the-art methods on imbalanced human and cross-species data. It also showed the highest ability among the tested methods for discovering novel miRNA precursors. In addition, miRBoost was over 1400 times faster than the second most accurate tool tested and was significantly faster than most of the other tools. miRBoost thus provides a good compromise between prediction efficiency and execution time, making it highly suitable for use in genome-wide miRNA precursor prediction. The software miRBoost is available on our web server http://EvryRNA.ibisc.univ-evry.fr. |
format | Online Article Text |
id | pubmed-4408786 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-44087862015-05-01 miRBoost: boosting support vector machines for microRNA precursor classification Tran, Van Du T. Tempel, Sebastien Zerath, Benjamin Zehraoui, Farida Tahi, Fariza RNA Bioinformatics Identification of microRNAs (miRNAs) is an important step toward understanding post-transcriptional gene regulation and miRNA-related pathology. Difficulties in identifying miRNAs through experimental techniques combined with the huge amount of data from new sequencing technologies have made in silico discrimination of bona fide miRNA precursors from non-miRNA hairpin-like structures an important topic in bioinformatics. Among various techniques developed for this classification problem, machine learning approaches have proved to be the most promising. However these approaches require the use of training data, which is problematic due to an imbalance in the number of miRNAs (positive data) and non-miRNAs (negative data), which leads to a degradation of their performance. In order to address this issue, we present an ensemble method that uses a boosting technique with support vector machine components to deal with imbalanced training data. Classification is performed following a feature selection on 187 novel and existing features. The algorithm, miRBoost, performed better in comparison with state-of-the-art methods on imbalanced human and cross-species data. It also showed the highest ability among the tested methods for discovering novel miRNA precursors. In addition, miRBoost was over 1400 times faster than the second most accurate tool tested and was significantly faster than most of the other tools. miRBoost thus provides a good compromise between prediction efficiency and execution time, making it highly suitable for use in genome-wide miRNA precursor prediction. The software miRBoost is available on our web server http://EvryRNA.ibisc.univ-evry.fr. Cold Spring Harbor Laboratory Press 2015-05 /pmc/articles/PMC4408786/ /pubmed/25795417 http://dx.doi.org/10.1261/rna.043612.113 Text en © 2015 Tran et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society http://creativecommons.org/licenses/by-nc/4.0/ This article, published in RNA, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/. |
spellingShingle | Bioinformatics Tran, Van Du T. Tempel, Sebastien Zerath, Benjamin Zehraoui, Farida Tahi, Fariza miRBoost: boosting support vector machines for microRNA precursor classification |
title | miRBoost: boosting support vector machines for microRNA precursor classification |
title_full | miRBoost: boosting support vector machines for microRNA precursor classification |
title_fullStr | miRBoost: boosting support vector machines for microRNA precursor classification |
title_full_unstemmed | miRBoost: boosting support vector machines for microRNA precursor classification |
title_short | miRBoost: boosting support vector machines for microRNA precursor classification |
title_sort | mirboost: boosting support vector machines for microrna precursor classification |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4408786/ https://www.ncbi.nlm.nih.gov/pubmed/25795417 http://dx.doi.org/10.1261/rna.043612.113 |
work_keys_str_mv | AT tranvandut mirboostboostingsupportvectormachinesformicrornaprecursorclassification AT tempelsebastien mirboostboostingsupportvectormachinesformicrornaprecursorclassification AT zerathbenjamin mirboostboostingsupportvectormachinesformicrornaprecursorclassification AT zehraouifarida mirboostboostingsupportvectormachinesformicrornaprecursorclassification AT tahifariza mirboostboostingsupportvectormachinesformicrornaprecursorclassification |