Cargando…
A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting
BACKGROUND: Circular RNAs (circRNAs) are those RNA molecules that lack the poly (A) tails, which present the closed-loop structure. Recent studies emphasized that some circRNAs imply different functions from canonical transcripts, and further associated with complex diseases. Several computational m...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7346313/ https://www.ncbi.nlm.nih.gov/pubmed/32646420 http://dx.doi.org/10.1186/s12911-020-1117-0 |
_version_ | 1783556381707599872 |
---|---|
author | Wang, Yidan Zhang, Xuanping Wang, Tao Xing, Jinchun Wu, Zhun Li, Wei Wang, Jiayin |
author_facet | Wang, Yidan Zhang, Xuanping Wang, Tao Xing, Jinchun Wu, Zhun Li, Wei Wang, Jiayin |
author_sort | Wang, Yidan |
collection | PubMed |
description | BACKGROUND: Circular RNAs (circRNAs) are those RNA molecules that lack the poly (A) tails, which present the closed-loop structure. Recent studies emphasized that some circRNAs imply different functions from canonical transcripts, and further associated with complex diseases. Several computational methods have been developed for detecting circRNAs from RNA-seq data. However, the existing methods prefer to high sensitivity strategies, which always introduce many false positives. Thus, in clinical decision-supporting system, a comprehensive filtering approach is needed for accurately recognizing real circRNAs for decision models. METHODS: In this paper, we first reviewed the detection strategies of the existing methods. According to the features from RNA-seq data, we showed that any single feature (data signal) selected by the existing strategies cannot accurately distinguish a circRNA. However, we found that some combinations of those features (data signals) could be used as signatures for recognizing circRNAs. To avoid the high computational complexity of the combinational optimization problem, we present CIRCPlus2, which adopts a machine learning framework to recognize real circRNAs according to multiple data signals captured from RNA-seq data. By comparing multiple machine learning frameworks, CIRCPlus2 adopts a Gradient Boosting Decision Tree (GBDT) framework. RESULTS: Given a set of candidate circRNAs, reported by any existing detection tool(s), the features of each candidate are extracted from the aligned reads. The GBDT framework can be trained by a training dataset. By applying the selected features on the framework, the predictions on true/false positives are reported. To verify the performance of the proposed approach, we conducted several groups of experiments on both real RNA-seq datasets and a series of simulation datasets with different preset configurations. The results demonstrated that CIRCPlus2 clearly improved the specificities, while it also maintained high levels of sensitivities. CONCLUSIONS: Filtering false positives is quite important in RNA-seq data analysis pipeline. Machine learning framework is suitable for solving this filtering problem. CIRCPlus2 is an efficient approach to identify the false positive circRNAs from the real ones. |
format | Online Article Text |
id | pubmed-7346313 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-73463132020-07-14 A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting Wang, Yidan Zhang, Xuanping Wang, Tao Xing, Jinchun Wu, Zhun Li, Wei Wang, Jiayin BMC Med Inform Decis Mak Research BACKGROUND: Circular RNAs (circRNAs) are those RNA molecules that lack the poly (A) tails, which present the closed-loop structure. Recent studies emphasized that some circRNAs imply different functions from canonical transcripts, and further associated with complex diseases. Several computational methods have been developed for detecting circRNAs from RNA-seq data. However, the existing methods prefer to high sensitivity strategies, which always introduce many false positives. Thus, in clinical decision-supporting system, a comprehensive filtering approach is needed for accurately recognizing real circRNAs for decision models. METHODS: In this paper, we first reviewed the detection strategies of the existing methods. According to the features from RNA-seq data, we showed that any single feature (data signal) selected by the existing strategies cannot accurately distinguish a circRNA. However, we found that some combinations of those features (data signals) could be used as signatures for recognizing circRNAs. To avoid the high computational complexity of the combinational optimization problem, we present CIRCPlus2, which adopts a machine learning framework to recognize real circRNAs according to multiple data signals captured from RNA-seq data. By comparing multiple machine learning frameworks, CIRCPlus2 adopts a Gradient Boosting Decision Tree (GBDT) framework. RESULTS: Given a set of candidate circRNAs, reported by any existing detection tool(s), the features of each candidate are extracted from the aligned reads. The GBDT framework can be trained by a training dataset. By applying the selected features on the framework, the predictions on true/false positives are reported. To verify the performance of the proposed approach, we conducted several groups of experiments on both real RNA-seq datasets and a series of simulation datasets with different preset configurations. The results demonstrated that CIRCPlus2 clearly improved the specificities, while it also maintained high levels of sensitivities. CONCLUSIONS: Filtering false positives is quite important in RNA-seq data analysis pipeline. Machine learning framework is suitable for solving this filtering problem. CIRCPlus2 is an efficient approach to identify the false positive circRNAs from the real ones. BioMed Central 2020-07-09 /pmc/articles/PMC7346313/ /pubmed/32646420 http://dx.doi.org/10.1186/s12911-020-1117-0 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Wang, Yidan Zhang, Xuanping Wang, Tao Xing, Jinchun Wu, Zhun Li, Wei Wang, Jiayin A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting |
title | A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting |
title_full | A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting |
title_fullStr | A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting |
title_full_unstemmed | A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting |
title_short | A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting |
title_sort | machine learning framework for accurately recognizing circular rnas for clinical decision-supporting |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7346313/ https://www.ncbi.nlm.nih.gov/pubmed/32646420 http://dx.doi.org/10.1186/s12911-020-1117-0 |
work_keys_str_mv | AT wangyidan amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting AT zhangxuanping amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting AT wangtao amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting AT xingjinchun amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting AT wuzhun amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting AT liwei amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting AT wangjiayin amachinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting AT wangyidan machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting AT zhangxuanping machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting AT wangtao machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting AT xingjinchun machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting AT wuzhun machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting AT liwei machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting AT wangjiayin machinelearningframeworkforaccuratelyrecognizingcircularrnasforclinicaldecisionsupporting |