Cargando…

On the evaluation of the fidelity of supervised classifiers in the prediction of chimeric RNAs

BACKGROUND: High-throughput sequencing technology and bioinformatics have identified chimeric RNAs (chRNAs), raising the possibility of chRNAs expressing particularly in diseases can be used as potential biomarkers in both diagnosis and prognosis. RESULTS: The task of discriminating true chRNAs from...

Descripción completa

Detalles Bibliográficos
Autores principales:	Beaumeunier, Sacha, Audoux, Jérôme, Boureux, Anthony, Ruffle, Florence, Commes, Thérèse, Philippe, Nicolas, Alves, Ronnie
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5090896/ https://www.ncbi.nlm.nih.gov/pubmed/27822312 http://dx.doi.org/10.1186/s13040-016-0112-6

_version_	1782464476126117888
author	Beaumeunier, Sacha Audoux, Jérôme Boureux, Anthony Ruffle, Florence Commes, Thérèse Philippe, Nicolas Alves, Ronnie
author_facet	Beaumeunier, Sacha Audoux, Jérôme Boureux, Anthony Ruffle, Florence Commes, Thérèse Philippe, Nicolas Alves, Ronnie
author_sort	Beaumeunier, Sacha
collection	PubMed
description	BACKGROUND: High-throughput sequencing technology and bioinformatics have identified chimeric RNAs (chRNAs), raising the possibility of chRNAs expressing particularly in diseases can be used as potential biomarkers in both diagnosis and prognosis. RESULTS: The task of discriminating true chRNAs from the false ones poses an interesting Machine Learning (ML) challenge. First of all, the sequencing data may contain false reads due to technical artifacts and during the analysis process, bioinformatics tools may generate false positives due to methodological biases. Moreover, if we succeed to have a proper set of observations (enough sequencing data) about true chRNAs, chances are that the devised model can not be able to generalize beyond it. Like any other machine learning problem, the first big issue is finding the good data to build models. As far as we were concerned, there is no common benchmark data available for chRNAs detection. The definition of a classification baseline is lacking in the related literature too. In this work we are moving towards benchmark data and an evaluation of the fidelity of supervised classifiers in the prediction of chRNAs. CONCLUSIONS: We proposed a modelization strategy that can be used to increase the tools performances in context of chRNA classification based on a simulated data generator, that permit to continuously integrate new complex chimeric events. The pipeline incorporated a genome mutation process and simulated RNA-seq data. The reads within distinct depth were aligned and analysed by CRAC that integrates genomic location and local coverage, allowing biological predictions at the read scale. Additionally, these reads were functionally annotated and aggregated to form chRNAs events, making it possible to evaluate ML methods (classifiers) performance in both levels of reads and events. Ensemble learning strategies demonstrated to be more robust to this classification problem, providing an average AUC performance of 95 % (ACC=94 %, Kappa=0.87 %). The resulting classification models were also tested on real RNA-seq data from a set of twenty-seven patients with acute myeloid leukemia (AML). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-016-0112-6) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5090896
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-50908962016-11-07 On the evaluation of the fidelity of supervised classifiers in the prediction of chimeric RNAs Beaumeunier, Sacha Audoux, Jérôme Boureux, Anthony Ruffle, Florence Commes, Thérèse Philippe, Nicolas Alves, Ronnie BioData Min Research BACKGROUND: High-throughput sequencing technology and bioinformatics have identified chimeric RNAs (chRNAs), raising the possibility of chRNAs expressing particularly in diseases can be used as potential biomarkers in both diagnosis and prognosis. RESULTS: The task of discriminating true chRNAs from the false ones poses an interesting Machine Learning (ML) challenge. First of all, the sequencing data may contain false reads due to technical artifacts and during the analysis process, bioinformatics tools may generate false positives due to methodological biases. Moreover, if we succeed to have a proper set of observations (enough sequencing data) about true chRNAs, chances are that the devised model can not be able to generalize beyond it. Like any other machine learning problem, the first big issue is finding the good data to build models. As far as we were concerned, there is no common benchmark data available for chRNAs detection. The definition of a classification baseline is lacking in the related literature too. In this work we are moving towards benchmark data and an evaluation of the fidelity of supervised classifiers in the prediction of chRNAs. CONCLUSIONS: We proposed a modelization strategy that can be used to increase the tools performances in context of chRNA classification based on a simulated data generator, that permit to continuously integrate new complex chimeric events. The pipeline incorporated a genome mutation process and simulated RNA-seq data. The reads within distinct depth were aligned and analysed by CRAC that integrates genomic location and local coverage, allowing biological predictions at the read scale. Additionally, these reads were functionally annotated and aggregated to form chRNAs events, making it possible to evaluate ML methods (classifiers) performance in both levels of reads and events. Ensemble learning strategies demonstrated to be more robust to this classification problem, providing an average AUC performance of 95 % (ACC=94 %, Kappa=0.87 %). The resulting classification models were also tested on real RNA-seq data from a set of twenty-seven patients with acute myeloid leukemia (AML). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-016-0112-6) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-02 /pmc/articles/PMC5090896/ /pubmed/27822312 http://dx.doi.org/10.1186/s13040-016-0112-6 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Beaumeunier, Sacha Audoux, Jérôme Boureux, Anthony Ruffle, Florence Commes, Thérèse Philippe, Nicolas Alves, Ronnie On the evaluation of the fidelity of supervised classifiers in the prediction of chimeric RNAs
title	On the evaluation of the fidelity of supervised classifiers in the prediction of chimeric RNAs
title_full	On the evaluation of the fidelity of supervised classifiers in the prediction of chimeric RNAs
title_fullStr	On the evaluation of the fidelity of supervised classifiers in the prediction of chimeric RNAs
title_full_unstemmed	On the evaluation of the fidelity of supervised classifiers in the prediction of chimeric RNAs
title_short	On the evaluation of the fidelity of supervised classifiers in the prediction of chimeric RNAs
title_sort	on the evaluation of the fidelity of supervised classifiers in the prediction of chimeric rnas
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5090896/ https://www.ncbi.nlm.nih.gov/pubmed/27822312 http://dx.doi.org/10.1186/s13040-016-0112-6
work_keys_str_mv	AT beaumeuniersacha ontheevaluationofthefidelityofsupervisedclassifiersinthepredictionofchimericrnas AT audouxjerome ontheevaluationofthefidelityofsupervisedclassifiersinthepredictionofchimericrnas AT boureuxanthony ontheevaluationofthefidelityofsupervisedclassifiersinthepredictionofchimericrnas AT ruffleflorence ontheevaluationofthefidelityofsupervisedclassifiersinthepredictionofchimericrnas AT commestherese ontheevaluationofthefidelityofsupervisedclassifiersinthepredictionofchimericrnas AT philippenicolas ontheevaluationofthefidelityofsupervisedclassifiersinthepredictionofchimericrnas AT alvesronnie ontheevaluationofthefidelityofsupervisedclassifiersinthepredictionofchimericrnas

On the evaluation of the fidelity of supervised classifiers in the prediction of chimeric RNAs

Ejemplares similares