Cargando…
MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome
Alternative splicing (AS) is an essential post-transcriptional mechanism that regulates many biological processes. However, identifying comprehensive types of AS events without guidance from a reference genome is still a challenge. Here, we proposed a novel method, MkcDBGAS, to identify all seven ty...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10576019/ https://www.ncbi.nlm.nih.gov/pubmed/37833843 http://dx.doi.org/10.1093/bib/bbad367 |
_version_ | 1785121034893000704 |
---|---|
author | Zhang, Quanbao Cao, Lei Song, Hongtao Lin, Kui Pang, Erli |
author_facet | Zhang, Quanbao Cao, Lei Song, Hongtao Lin, Kui Pang, Erli |
author_sort | Zhang, Quanbao |
collection | PubMed |
description | Alternative splicing (AS) is an essential post-transcriptional mechanism that regulates many biological processes. However, identifying comprehensive types of AS events without guidance from a reference genome is still a challenge. Here, we proposed a novel method, MkcDBGAS, to identify all seven types of AS events using transcriptome alone, without a reference genome. MkcDBGAS, modeled by full-length transcripts of human and Arabidopsis thaliana, consists of three modules. In the first module, MkcDBGAS, for the first time, uses a colored de Bruijn graph with dynamic- and mixed- kmers to identify bubbles generated by AS with precision higher than 98.17% and detect AS types overlooked by other tools. In the second module, to further classify types of AS, MkcDBGAS added the motifs of exons to construct the feature matrix followed by the XGBoost-based classifier with the accuracy of classification greater than 93.40%, which outperformed other widely used machine learning models and the state-of-the-art methods. Highly scalable, MkcDBGAS performed well when applied to Iso-Seq data of Amborella and transcriptome of mouse. In the third module, MkcDBGAS provides the analysis of differential splicing across multiple biological conditions when RNA-sequencing data is available. MkcDBGAS is the first accurate and scalable method for detecting all seven types of AS events using the transcriptome alone, which will greatly empower the studies of AS in a wider field. |
format | Online Article Text |
id | pubmed-10576019 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-105760192023-10-15 MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome Zhang, Quanbao Cao, Lei Song, Hongtao Lin, Kui Pang, Erli Brief Bioinform Problem Solving Protocol Alternative splicing (AS) is an essential post-transcriptional mechanism that regulates many biological processes. However, identifying comprehensive types of AS events without guidance from a reference genome is still a challenge. Here, we proposed a novel method, MkcDBGAS, to identify all seven types of AS events using transcriptome alone, without a reference genome. MkcDBGAS, modeled by full-length transcripts of human and Arabidopsis thaliana, consists of three modules. In the first module, MkcDBGAS, for the first time, uses a colored de Bruijn graph with dynamic- and mixed- kmers to identify bubbles generated by AS with precision higher than 98.17% and detect AS types overlooked by other tools. In the second module, to further classify types of AS, MkcDBGAS added the motifs of exons to construct the feature matrix followed by the XGBoost-based classifier with the accuracy of classification greater than 93.40%, which outperformed other widely used machine learning models and the state-of-the-art methods. Highly scalable, MkcDBGAS performed well when applied to Iso-Seq data of Amborella and transcriptome of mouse. In the third module, MkcDBGAS provides the analysis of differential splicing across multiple biological conditions when RNA-sequencing data is available. MkcDBGAS is the first accurate and scalable method for detecting all seven types of AS events using the transcriptome alone, which will greatly empower the studies of AS in a wider field. Oxford University Press 2023-10-13 /pmc/articles/PMC10576019/ /pubmed/37833843 http://dx.doi.org/10.1093/bib/bbad367 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Problem Solving Protocol Zhang, Quanbao Cao, Lei Song, Hongtao Lin, Kui Pang, Erli MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome |
title | MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome |
title_full | MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome |
title_fullStr | MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome |
title_full_unstemmed | MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome |
title_short | MkcDBGAS: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome |
title_sort | mkcdbgas: a reference-free approach to identify comprehensive alternative splicing events in a transcriptome |
topic | Problem Solving Protocol |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10576019/ https://www.ncbi.nlm.nih.gov/pubmed/37833843 http://dx.doi.org/10.1093/bib/bbad367 |
work_keys_str_mv | AT zhangquanbao mkcdbgasareferencefreeapproachtoidentifycomprehensivealternativesplicingeventsinatranscriptome AT caolei mkcdbgasareferencefreeapproachtoidentifycomprehensivealternativesplicingeventsinatranscriptome AT songhongtao mkcdbgasareferencefreeapproachtoidentifycomprehensivealternativesplicingeventsinatranscriptome AT linkui mkcdbgasareferencefreeapproachtoidentifycomprehensivealternativesplicingeventsinatranscriptome AT pangerli mkcdbgasareferencefreeapproachtoidentifycomprehensivealternativesplicingeventsinatranscriptome |