Cargando…

Splicing signature database development to delineate cancer pathways using literature mining and transcriptome machine learning

Alternative splicing (AS) events modulate certain pathways and phenotypic plasticity in cancer. Although previous studies have computationally analyzed splicing events, it is still a challenge to uncover biological functions induced by reliable AS events from tremendous candidates. To provide essent...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Kyubin, Hyung, Daejin, Cho, Soo Young, Yu, Namhee, Hong, Sewha, Kim, Jihyun, Kim, Sunshin, Han, Ji-Youn, Park, Charny
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10023904/
https://www.ncbi.nlm.nih.gov/pubmed/36942103
http://dx.doi.org/10.1016/j.csbj.2023.02.052
_version_ 1784908988010201088
author Lee, Kyubin
Hyung, Daejin
Cho, Soo Young
Yu, Namhee
Hong, Sewha
Kim, Jihyun
Kim, Sunshin
Han, Ji-Youn
Park, Charny
author_facet Lee, Kyubin
Hyung, Daejin
Cho, Soo Young
Yu, Namhee
Hong, Sewha
Kim, Jihyun
Kim, Sunshin
Han, Ji-Youn
Park, Charny
author_sort Lee, Kyubin
collection PubMed
description Alternative splicing (AS) events modulate certain pathways and phenotypic plasticity in cancer. Although previous studies have computationally analyzed splicing events, it is still a challenge to uncover biological functions induced by reliable AS events from tremendous candidates. To provide essential splicing event signatures to assess pathway regulation, we developed a database by collecting two datasets: (i) reported literature and (ii) cancer transcriptome profile. The former includes knowledge-based splicing signatures collected from 63,229 PubMed abstracts using natural language processing, extracted for 202 pathways. The latter is the machine learning-based splicing signatures identified from pan-cancer transcriptome for 16 cancer types and 42 pathways. We established six different learning models to classify pathway activities from splicing profiles as a learning dataset. Top-ranked AS events by learning model feature importance became the signature for each pathway. To validate our learning results, we performed evaluations by (i) performance metrics, (ii) differential AS sets acquired from external datasets, and (iii) our knowledge-based signatures. The area under the receiver operating characteristic values of the learning models did not exhibit any drastic difference. However, random-forest distinctly presented the best performance to compare with the AS sets identified from external datasets and our knowledge-based signatures. Therefore, we used the signatures obtained from the random-forest model. Our database provided the clinical characteristics of the AS signatures, including survival test, molecular subtype, and tumor microenvironment. The regulation by splicing factors was additionally investigated. Our database for developed signatures supported retrieval and visualization system.
format Online
Article
Text
id pubmed-10023904
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-100239042023-03-19 Splicing signature database development to delineate cancer pathways using literature mining and transcriptome machine learning Lee, Kyubin Hyung, Daejin Cho, Soo Young Yu, Namhee Hong, Sewha Kim, Jihyun Kim, Sunshin Han, Ji-Youn Park, Charny Comput Struct Biotechnol J Research Article Alternative splicing (AS) events modulate certain pathways and phenotypic plasticity in cancer. Although previous studies have computationally analyzed splicing events, it is still a challenge to uncover biological functions induced by reliable AS events from tremendous candidates. To provide essential splicing event signatures to assess pathway regulation, we developed a database by collecting two datasets: (i) reported literature and (ii) cancer transcriptome profile. The former includes knowledge-based splicing signatures collected from 63,229 PubMed abstracts using natural language processing, extracted for 202 pathways. The latter is the machine learning-based splicing signatures identified from pan-cancer transcriptome for 16 cancer types and 42 pathways. We established six different learning models to classify pathway activities from splicing profiles as a learning dataset. Top-ranked AS events by learning model feature importance became the signature for each pathway. To validate our learning results, we performed evaluations by (i) performance metrics, (ii) differential AS sets acquired from external datasets, and (iii) our knowledge-based signatures. The area under the receiver operating characteristic values of the learning models did not exhibit any drastic difference. However, random-forest distinctly presented the best performance to compare with the AS sets identified from external datasets and our knowledge-based signatures. Therefore, we used the signatures obtained from the random-forest model. Our database provided the clinical characteristics of the AS signatures, including survival test, molecular subtype, and tumor microenvironment. The regulation by splicing factors was additionally investigated. Our database for developed signatures supported retrieval and visualization system. Research Network of Computational and Structural Biotechnology 2023-03-02 /pmc/articles/PMC10023904/ /pubmed/36942103 http://dx.doi.org/10.1016/j.csbj.2023.02.052 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Lee, Kyubin
Hyung, Daejin
Cho, Soo Young
Yu, Namhee
Hong, Sewha
Kim, Jihyun
Kim, Sunshin
Han, Ji-Youn
Park, Charny
Splicing signature database development to delineate cancer pathways using literature mining and transcriptome machine learning
title Splicing signature database development to delineate cancer pathways using literature mining and transcriptome machine learning
title_full Splicing signature database development to delineate cancer pathways using literature mining and transcriptome machine learning
title_fullStr Splicing signature database development to delineate cancer pathways using literature mining and transcriptome machine learning
title_full_unstemmed Splicing signature database development to delineate cancer pathways using literature mining and transcriptome machine learning
title_short Splicing signature database development to delineate cancer pathways using literature mining and transcriptome machine learning
title_sort splicing signature database development to delineate cancer pathways using literature mining and transcriptome machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10023904/
https://www.ncbi.nlm.nih.gov/pubmed/36942103
http://dx.doi.org/10.1016/j.csbj.2023.02.052
work_keys_str_mv AT leekyubin splicingsignaturedatabasedevelopmenttodelineatecancerpathwaysusingliteratureminingandtranscriptomemachinelearning
AT hyungdaejin splicingsignaturedatabasedevelopmenttodelineatecancerpathwaysusingliteratureminingandtranscriptomemachinelearning
AT chosooyoung splicingsignaturedatabasedevelopmenttodelineatecancerpathwaysusingliteratureminingandtranscriptomemachinelearning
AT yunamhee splicingsignaturedatabasedevelopmenttodelineatecancerpathwaysusingliteratureminingandtranscriptomemachinelearning
AT hongsewha splicingsignaturedatabasedevelopmenttodelineatecancerpathwaysusingliteratureminingandtranscriptomemachinelearning
AT kimjihyun splicingsignaturedatabasedevelopmenttodelineatecancerpathwaysusingliteratureminingandtranscriptomemachinelearning
AT kimsunshin splicingsignaturedatabasedevelopmenttodelineatecancerpathwaysusingliteratureminingandtranscriptomemachinelearning
AT hanjiyoun splicingsignaturedatabasedevelopmenttodelineatecancerpathwaysusingliteratureminingandtranscriptomemachinelearning
AT parkcharny splicingsignaturedatabasedevelopmenttodelineatecancerpathwaysusingliteratureminingandtranscriptomemachinelearning