Cargando…

miRBoost: boosting support vector machines for microRNA precursor classification

Identification of microRNAs (miRNAs) is an important step toward understanding post-transcriptional gene regulation and miRNA-related pathology. Difficulties in identifying miRNAs through experimental techniques combined with the huge amount of data from new sequencing technologies have made in sili...

Descripción completa

Detalles Bibliográficos
Autores principales: Tran, Van Du T., Tempel, Sebastien, Zerath, Benjamin, Zehraoui, Farida, Tahi, Fariza
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4408786/
https://www.ncbi.nlm.nih.gov/pubmed/25795417
http://dx.doi.org/10.1261/rna.043612.113
_version_ 1782368106154295296
author Tran, Van Du T.
Tempel, Sebastien
Zerath, Benjamin
Zehraoui, Farida
Tahi, Fariza
author_facet Tran, Van Du T.
Tempel, Sebastien
Zerath, Benjamin
Zehraoui, Farida
Tahi, Fariza
author_sort Tran, Van Du T.
collection PubMed
description Identification of microRNAs (miRNAs) is an important step toward understanding post-transcriptional gene regulation and miRNA-related pathology. Difficulties in identifying miRNAs through experimental techniques combined with the huge amount of data from new sequencing technologies have made in silico discrimination of bona fide miRNA precursors from non-miRNA hairpin-like structures an important topic in bioinformatics. Among various techniques developed for this classification problem, machine learning approaches have proved to be the most promising. However these approaches require the use of training data, which is problematic due to an imbalance in the number of miRNAs (positive data) and non-miRNAs (negative data), which leads to a degradation of their performance. In order to address this issue, we present an ensemble method that uses a boosting technique with support vector machine components to deal with imbalanced training data. Classification is performed following a feature selection on 187 novel and existing features. The algorithm, miRBoost, performed better in comparison with state-of-the-art methods on imbalanced human and cross-species data. It also showed the highest ability among the tested methods for discovering novel miRNA precursors. In addition, miRBoost was over 1400 times faster than the second most accurate tool tested and was significantly faster than most of the other tools. miRBoost thus provides a good compromise between prediction efficiency and execution time, making it highly suitable for use in genome-wide miRNA precursor prediction. The software miRBoost is available on our web server http://EvryRNA.ibisc.univ-evry.fr.
format Online
Article
Text
id pubmed-4408786
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-44087862015-05-01 miRBoost: boosting support vector machines for microRNA precursor classification Tran, Van Du T. Tempel, Sebastien Zerath, Benjamin Zehraoui, Farida Tahi, Fariza RNA Bioinformatics Identification of microRNAs (miRNAs) is an important step toward understanding post-transcriptional gene regulation and miRNA-related pathology. Difficulties in identifying miRNAs through experimental techniques combined with the huge amount of data from new sequencing technologies have made in silico discrimination of bona fide miRNA precursors from non-miRNA hairpin-like structures an important topic in bioinformatics. Among various techniques developed for this classification problem, machine learning approaches have proved to be the most promising. However these approaches require the use of training data, which is problematic due to an imbalance in the number of miRNAs (positive data) and non-miRNAs (negative data), which leads to a degradation of their performance. In order to address this issue, we present an ensemble method that uses a boosting technique with support vector machine components to deal with imbalanced training data. Classification is performed following a feature selection on 187 novel and existing features. The algorithm, miRBoost, performed better in comparison with state-of-the-art methods on imbalanced human and cross-species data. It also showed the highest ability among the tested methods for discovering novel miRNA precursors. In addition, miRBoost was over 1400 times faster than the second most accurate tool tested and was significantly faster than most of the other tools. miRBoost thus provides a good compromise between prediction efficiency and execution time, making it highly suitable for use in genome-wide miRNA precursor prediction. The software miRBoost is available on our web server http://EvryRNA.ibisc.univ-evry.fr. Cold Spring Harbor Laboratory Press 2015-05 /pmc/articles/PMC4408786/ /pubmed/25795417 http://dx.doi.org/10.1261/rna.043612.113 Text en © 2015 Tran et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society http://creativecommons.org/licenses/by-nc/4.0/ This article, published in RNA, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Bioinformatics
Tran, Van Du T.
Tempel, Sebastien
Zerath, Benjamin
Zehraoui, Farida
Tahi, Fariza
miRBoost: boosting support vector machines for microRNA precursor classification
title miRBoost: boosting support vector machines for microRNA precursor classification
title_full miRBoost: boosting support vector machines for microRNA precursor classification
title_fullStr miRBoost: boosting support vector machines for microRNA precursor classification
title_full_unstemmed miRBoost: boosting support vector machines for microRNA precursor classification
title_short miRBoost: boosting support vector machines for microRNA precursor classification
title_sort mirboost: boosting support vector machines for microrna precursor classification
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4408786/
https://www.ncbi.nlm.nih.gov/pubmed/25795417
http://dx.doi.org/10.1261/rna.043612.113
work_keys_str_mv AT tranvandut mirboostboostingsupportvectormachinesformicrornaprecursorclassification
AT tempelsebastien mirboostboostingsupportvectormachinesformicrornaprecursorclassification
AT zerathbenjamin mirboostboostingsupportvectormachinesformicrornaprecursorclassification
AT zehraouifarida mirboostboostingsupportvectormachinesformicrornaprecursorclassification
AT tahifariza mirboostboostingsupportvectormachinesformicrornaprecursorclassification