Cargando…

MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome

Alternative splicing (AS) is an essential post-transcriptional mechanism that regulates many biological processes. However, identifying comprehensive types of AS events without guidance from a reference genome is still a challenge. Here, we proposed a novel method, MkcDBGAS, to identify all seven ty...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Quanbao, Cao, Lei, Song, Hongtao, Lin, Kui, Pang, Erli
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10576019/
https://www.ncbi.nlm.nih.gov/pubmed/37833843
http://dx.doi.org/10.1093/bib/bbad367
_version_ 1785121034893000704
author Zhang, Quanbao
Cao, Lei
Song, Hongtao
Lin, Kui
Pang, Erli
author_facet Zhang, Quanbao
Cao, Lei
Song, Hongtao
Lin, Kui
Pang, Erli
author_sort Zhang, Quanbao
collection PubMed
description Alternative splicing (AS) is an essential post-transcriptional mechanism that regulates many biological processes. However, identifying comprehensive types of AS events without guidance from a reference genome is still a challenge. Here, we proposed a novel method, MkcDBGAS, to identify all seven types of AS events using transcriptome alone, without a reference genome. MkcDBGAS, modeled by full-length transcripts of human and Arabidopsis thaliana, consists of three modules. In the first module, MkcDBGAS, for the first time, uses a colored de Bruijn graph with dynamic- and mixed- kmers to identify bubbles generated by AS with precision higher than 98.17% and detect AS types overlooked by other tools. In the second module, to further classify types of AS, MkcDBGAS added the motifs of exons to construct the feature matrix followed by the XGBoost-based classifier with the accuracy of classification greater than 93.40%, which outperformed other widely used machine learning models and the state-of-the-art methods. Highly scalable, MkcDBGAS performed well when applied to Iso-Seq data of Amborella and transcriptome of mouse. In the third module, MkcDBGAS provides the analysis of differential splicing across multiple biological conditions when RNA-sequencing data is available. MkcDBGAS is the first accurate and scalable method for detecting all seven types of AS events using the transcriptome alone, which will greatly empower the studies of AS in a wider field.
format Online
Article
Text
id pubmed-10576019
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105760192023-10-15 MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome Zhang, Quanbao Cao, Lei Song, Hongtao Lin, Kui Pang, Erli Brief Bioinform Problem Solving Protocol Alternative splicing (AS) is an essential post-transcriptional mechanism that regulates many biological processes. However, identifying comprehensive types of AS events without guidance from a reference genome is still a challenge. Here, we proposed a novel method, MkcDBGAS, to identify all seven types of AS events using transcriptome alone, without a reference genome. MkcDBGAS, modeled by full-length transcripts of human and Arabidopsis thaliana, consists of three modules. In the first module, MkcDBGAS, for the first time, uses a colored de Bruijn graph with dynamic- and mixed- kmers to identify bubbles generated by AS with precision higher than 98.17% and detect AS types overlooked by other tools. In the second module, to further classify types of AS, MkcDBGAS added the motifs of exons to construct the feature matrix followed by the XGBoost-based classifier with the accuracy of classification greater than 93.40%, which outperformed other widely used machine learning models and the state-of-the-art methods. Highly scalable, MkcDBGAS performed well when applied to Iso-Seq data of Amborella and transcriptome of mouse. In the third module, MkcDBGAS provides the analysis of differential splicing across multiple biological conditions when RNA-sequencing data is available. MkcDBGAS is the first accurate and scalable method for detecting all seven types of AS events using the transcriptome alone, which will greatly empower the studies of AS in a wider field. Oxford University Press 2023-10-13 /pmc/articles/PMC10576019/ /pubmed/37833843 http://dx.doi.org/10.1093/bib/bbad367 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Zhang, Quanbao
Cao, Lei
Song, Hongtao
Lin, Kui
Pang, Erli
MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome
title MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome
title_full MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome
title_fullStr MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome
title_full_unstemmed MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome
title_short MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome
title_sort mkcdbgas: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10576019/
https://www.ncbi.nlm.nih.gov/pubmed/37833843
http://dx.doi.org/10.1093/bib/bbad367
work_keys_str_mv AT zhangquanbao mkcdbgasareferencefreeapproachtoidentifycomprehensivealternativesplicingeventsinatranscriptome
AT caolei mkcdbgasareferencefreeapproachtoidentifycomprehensivealternativesplicingeventsinatranscriptome
AT songhongtao mkcdbgasareferencefreeapproachtoidentifycomprehensivealternativesplicingeventsinatranscriptome
AT linkui mkcdbgasareferencefreeapproachtoidentifycomprehensivealternativesplicingeventsinatranscriptome
AT pangerli mkcdbgasareferencefreeapproachtoidentifycomprehensivealternativesplicingeventsinatranscriptome