Cargando…

A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting

BACKGROUND: Circular RNAs (circRNAs) are those RNA molecules that lack the poly (A) tails, which present the closed-loop structure. Recent studies emphasized that some circRNAs imply different functions from canonical transcripts, and further associated with complex diseases. Several computational m...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yidan, Zhang, Xuanping, Wang, Tao, Xing, Jinchun, Wu, Zhun, Li, Wei, Wang, Jiayin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7346313/
https://www.ncbi.nlm.nih.gov/pubmed/32646420
http://dx.doi.org/10.1186/s12911-020-1117-0
_version_ 1783556381707599872
author Wang, Yidan
Zhang, Xuanping
Wang, Tao
Xing, Jinchun
Wu, Zhun
Li, Wei
Wang, Jiayin
author_facet Wang, Yidan
Zhang, Xuanping
Wang, Tao
Xing, Jinchun
Wu, Zhun
Li, Wei
Wang, Jiayin
author_sort Wang, Yidan
collection PubMed
description BACKGROUND: Circular RNAs (circRNAs) are those RNA molecules that lack the poly (A) tails, which present the closed-loop structure. Recent studies emphasized that some circRNAs imply different functions from canonical transcripts, and further associated with complex diseases. Several computational methods have been developed for detecting circRNAs from RNA-seq data. However, the existing methods prefer to high sensitivity strategies, which always introduce many false positives. Thus, in clinical decision-supporting system, a comprehensive filtering approach is needed for accurately recognizing real circRNAs for decision models. METHODS: In this paper, we first reviewed the detection strategies of the existing methods. According to the features from RNA-seq data, we showed that any single feature (data signal) selected by the existing strategies cannot accurately distinguish a circRNA. However, we found that some combinations of those features (data signals) could be used as signatures for recognizing circRNAs. To avoid the high computational complexity of the combinational optimization problem, we present CIRCPlus2, which adopts a machine learning framework to recognize real circRNAs according to multiple data signals captured from RNA-seq data. By comparing multiple machine learning frameworks, CIRCPlus2 adopts a Gradient Boosting Decision Tree (GBDT) framework. RESULTS: Given a set of candidate circRNAs, reported by any existing detection tool(s), the features of each candidate are extracted from the aligned reads. The GBDT framework can be trained by a training dataset. By applying the selected features on the framework, the predictions on true/false positives are reported. To verify the performance of the proposed approach, we conducted several groups of experiments on both real RNA-seq datasets and a series of simulation datasets with different preset configurations. The results demonstrated that CIRCPlus2 clearly improved the specificities, while it also maintained high levels of sensitivities. CONCLUSIONS: Filtering false positives is quite important in RNA-seq data analysis pipeline. Machine learning framework is suitable for solving this filtering problem. CIRCPlus2 is an efficient approach to identify the false positive circRNAs from the real ones.
format Online
Article
Text
id pubmed-7346313
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-73463132020-07-14 A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting Wang, Yidan Zhang, Xuanping Wang, Tao Xing, Jinchun Wu, Zhun Li, Wei Wang, Jiayin BMC Med Inform Decis Mak Research BACKGROUND: Circular RNAs (circRNAs) are those RNA molecules that lack the poly (A) tails, which present the closed-loop structure. Recent studies emphasized that some circRNAs imply different functions from canonical transcripts, and further associated with complex diseases. Several computational methods have been developed for detecting circRNAs from RNA-seq data. However, the existing methods prefer to high sensitivity strategies, which always introduce many false positives. Thus, in clinical decision-supporting system, a comprehensive filtering approach is needed for accurately recognizing real circRNAs for decision models. METHODS: In this paper, we first reviewed the detection strategies of the existing methods. According to the features from RNA-seq data, we showed that any single feature (data signal) selected by the existing strategies cannot accurately distinguish a circRNA. However, we found that some combinations of those features (data signals) could be used as signatures for recognizing circRNAs. To avoid the high computational complexity of the combinational optimization problem, we present CIRCPlus2, which adopts a machine learning framework to recognize real circRNAs according to multiple data signals captured from RNA-seq data. By comparing multiple machine learning frameworks, CIRCPlus2 adopts a Gradient Boosting Decision Tree (GBDT) framework. RESULTS: Given a set of candidate circRNAs, reported by any existing detection tool(s), the features of each candidate are extracted from the aligned reads. The GBDT framework can be trained by a training dataset. By applying the selected features on the framework, the predictions on true/false positives are reported. To verify the performance of the proposed approach, we conducted several groups of experiments on both real RNA-seq datasets and a series of simulation datasets with different preset configurations. The results demonstrated that CIRCPlus2 clearly improved the specificities, while it also maintained high levels of sensitivities. CONCLUSIONS: Filtering false positives is quite important in RNA-seq data analysis pipeline. Machine learning framework is suitable for solving this filtering problem. CIRCPlus2 is an efficient approach to identify the false positive circRNAs from the real ones. BioMed Central 2020-07-09 /pmc/articles/PMC7346313/ /pubmed/32646420 http://dx.doi.org/10.1186/s12911-020-1117-0 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Wang, Yidan
Zhang, Xuanping
Wang, Tao
Xing, Jinchun
Wu, Zhun
Li, Wei
Wang, Jiayin
A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting
title A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting
title_full A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting
title_fullStr A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting
title_full_unstemmed A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting
title_short A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting
title_sort machine learning framework for accurately recognizing circular rnas for clinical decision-supporting
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7346313/
https://www.ncbi.nlm.nih.gov/pubmed/32646420
http://dx.doi.org/10.1186/s12911-020-1117-0
work_keys_str_mv AT wangyidan amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT zhangxuanping amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT wangtao amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT xingjinchun amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT wuzhun amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT liwei amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT wangjiayin amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT wangyidan machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT zhangxuanping machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT wangtao machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT xingjinchun machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT wuzhun machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT liwei machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting
AT wangjiayin machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting