Cargando…

Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data

Many disease-associated genomic variants disrupt gene function through abnormal splicing. With the advancement of genomic medicine, identifying disease-associated splicing associated variants has become more important than ever. Most bioinformatics approaches to detect splicing associated variants r...

Descripción completa

Detalles Bibliográficos
Autores principales: Shiraishi, Yuichi, Okada, Ai, Chiba, Kenichi, Kawachi, Asuka, Omori, Ikuko, Mateos, Raúl Nicolás, Iida, Naoko, Yamauchi, Hirofumi, Kosaki, Kenjiro, Yoshimi, Akihide
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9522810/
https://www.ncbi.nlm.nih.gov/pubmed/36175409
http://dx.doi.org/10.1038/s41467-022-32887-9
_version_ 1784800138201399296
author Shiraishi, Yuichi
Okada, Ai
Chiba, Kenichi
Kawachi, Asuka
Omori, Ikuko
Mateos, Raúl Nicolás
Iida, Naoko
Yamauchi, Hirofumi
Kosaki, Kenjiro
Yoshimi, Akihide
author_facet Shiraishi, Yuichi
Okada, Ai
Chiba, Kenichi
Kawachi, Asuka
Omori, Ikuko
Mateos, Raúl Nicolás
Iida, Naoko
Yamauchi, Hirofumi
Kosaki, Kenjiro
Yoshimi, Akihide
author_sort Shiraishi, Yuichi
collection PubMed
description Many disease-associated genomic variants disrupt gene function through abnormal splicing. With the advancement of genomic medicine, identifying disease-associated splicing associated variants has become more important than ever. Most bioinformatics approaches to detect splicing associated variants require both genome and transcriptomic data. However, there are not many datasets where both of them are available. In this study, we develop a methodology to detect genomic variants that cause splicing changes (more specifically, intron retention), using transcriptome sequencing data alone. After evaluating its sensitivity and precision, we apply it to 230,988 transcriptome sequencing data from the publicly available repository and identified 27,049 intron retention associated variants (IRAVs). In addition, by exploring positional relationships with variants registered in existing disease databases, we extract 3,000 putative disease-associated IRAVs, which range from cancer drivers to variants linked with autosomal recessive disorders. The in-silico screening framework demonstrates the possibility of near-automatically acquiring medical knowledge, making the most of massively accumulated publicly available sequencing data. Collections of IRAVs identified in this study are available through IRAVDB (https://iravdb.io/).
format Online
Article
Text
id pubmed-9522810
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-95228102022-10-01 Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data Shiraishi, Yuichi Okada, Ai Chiba, Kenichi Kawachi, Asuka Omori, Ikuko Mateos, Raúl Nicolás Iida, Naoko Yamauchi, Hirofumi Kosaki, Kenjiro Yoshimi, Akihide Nat Commun Article Many disease-associated genomic variants disrupt gene function through abnormal splicing. With the advancement of genomic medicine, identifying disease-associated splicing associated variants has become more important than ever. Most bioinformatics approaches to detect splicing associated variants require both genome and transcriptomic data. However, there are not many datasets where both of them are available. In this study, we develop a methodology to detect genomic variants that cause splicing changes (more specifically, intron retention), using transcriptome sequencing data alone. After evaluating its sensitivity and precision, we apply it to 230,988 transcriptome sequencing data from the publicly available repository and identified 27,049 intron retention associated variants (IRAVs). In addition, by exploring positional relationships with variants registered in existing disease databases, we extract 3,000 putative disease-associated IRAVs, which range from cancer drivers to variants linked with autosomal recessive disorders. The in-silico screening framework demonstrates the possibility of near-automatically acquiring medical knowledge, making the most of massively accumulated publicly available sequencing data. Collections of IRAVs identified in this study are available through IRAVDB (https://iravdb.io/). Nature Publishing Group UK 2022-09-29 /pmc/articles/PMC9522810/ /pubmed/36175409 http://dx.doi.org/10.1038/s41467-022-32887-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Shiraishi, Yuichi
Okada, Ai
Chiba, Kenichi
Kawachi, Asuka
Omori, Ikuko
Mateos, Raúl Nicolás
Iida, Naoko
Yamauchi, Hirofumi
Kosaki, Kenjiro
Yoshimi, Akihide
Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data
title Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data
title_full Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data
title_fullStr Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data
title_full_unstemmed Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data
title_short Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data
title_sort systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9522810/
https://www.ncbi.nlm.nih.gov/pubmed/36175409
http://dx.doi.org/10.1038/s41467-022-32887-9
work_keys_str_mv AT shiraishiyuichi systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata
AT okadaai systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata
AT chibakenichi systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata
AT kawachiasuka systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata
AT omoriikuko systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata
AT mateosraulnicolas systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata
AT iidanaoko systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata
AT yamauchihirofumi systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata
AT kosakikenjiro systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata
AT yoshimiakihide systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata