Cargando…
Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data
Many disease-associated genomic variants disrupt gene function through abnormal splicing. With the advancement of genomic medicine, identifying disease-associated splicing associated variants has become more important than ever. Most bioinformatics approaches to detect splicing associated variants r...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9522810/ https://www.ncbi.nlm.nih.gov/pubmed/36175409 http://dx.doi.org/10.1038/s41467-022-32887-9 |
_version_ | 1784800138201399296 |
---|---|
author | Shiraishi, Yuichi Okada, Ai Chiba, Kenichi Kawachi, Asuka Omori, Ikuko Mateos, Raúl Nicolás Iida, Naoko Yamauchi, Hirofumi Kosaki, Kenjiro Yoshimi, Akihide |
author_facet | Shiraishi, Yuichi Okada, Ai Chiba, Kenichi Kawachi, Asuka Omori, Ikuko Mateos, Raúl Nicolás Iida, Naoko Yamauchi, Hirofumi Kosaki, Kenjiro Yoshimi, Akihide |
author_sort | Shiraishi, Yuichi |
collection | PubMed |
description | Many disease-associated genomic variants disrupt gene function through abnormal splicing. With the advancement of genomic medicine, identifying disease-associated splicing associated variants has become more important than ever. Most bioinformatics approaches to detect splicing associated variants require both genome and transcriptomic data. However, there are not many datasets where both of them are available. In this study, we develop a methodology to detect genomic variants that cause splicing changes (more specifically, intron retention), using transcriptome sequencing data alone. After evaluating its sensitivity and precision, we apply it to 230,988 transcriptome sequencing data from the publicly available repository and identified 27,049 intron retention associated variants (IRAVs). In addition, by exploring positional relationships with variants registered in existing disease databases, we extract 3,000 putative disease-associated IRAVs, which range from cancer drivers to variants linked with autosomal recessive disorders. The in-silico screening framework demonstrates the possibility of near-automatically acquiring medical knowledge, making the most of massively accumulated publicly available sequencing data. Collections of IRAVs identified in this study are available through IRAVDB (https://iravdb.io/). |
format | Online Article Text |
id | pubmed-9522810 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-95228102022-10-01 Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data Shiraishi, Yuichi Okada, Ai Chiba, Kenichi Kawachi, Asuka Omori, Ikuko Mateos, Raúl Nicolás Iida, Naoko Yamauchi, Hirofumi Kosaki, Kenjiro Yoshimi, Akihide Nat Commun Article Many disease-associated genomic variants disrupt gene function through abnormal splicing. With the advancement of genomic medicine, identifying disease-associated splicing associated variants has become more important than ever. Most bioinformatics approaches to detect splicing associated variants require both genome and transcriptomic data. However, there are not many datasets where both of them are available. In this study, we develop a methodology to detect genomic variants that cause splicing changes (more specifically, intron retention), using transcriptome sequencing data alone. After evaluating its sensitivity and precision, we apply it to 230,988 transcriptome sequencing data from the publicly available repository and identified 27,049 intron retention associated variants (IRAVs). In addition, by exploring positional relationships with variants registered in existing disease databases, we extract 3,000 putative disease-associated IRAVs, which range from cancer drivers to variants linked with autosomal recessive disorders. The in-silico screening framework demonstrates the possibility of near-automatically acquiring medical knowledge, making the most of massively accumulated publicly available sequencing data. Collections of IRAVs identified in this study are available through IRAVDB (https://iravdb.io/). Nature Publishing Group UK 2022-09-29 /pmc/articles/PMC9522810/ /pubmed/36175409 http://dx.doi.org/10.1038/s41467-022-32887-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Shiraishi, Yuichi Okada, Ai Chiba, Kenichi Kawachi, Asuka Omori, Ikuko Mateos, Raúl Nicolás Iida, Naoko Yamauchi, Hirofumi Kosaki, Kenjiro Yoshimi, Akihide Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data |
title | Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data |
title_full | Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data |
title_fullStr | Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data |
title_full_unstemmed | Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data |
title_short | Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data |
title_sort | systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9522810/ https://www.ncbi.nlm.nih.gov/pubmed/36175409 http://dx.doi.org/10.1038/s41467-022-32887-9 |
work_keys_str_mv | AT shiraishiyuichi systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata AT okadaai systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata AT chibakenichi systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata AT kawachiasuka systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata AT omoriikuko systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata AT mateosraulnicolas systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata AT iidanaoko systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata AT yamauchihirofumi systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata AT kosakikenjiro systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata AT yoshimiakihide systematicidentificationofintronretentionassociatedvariantsfrommassivepubliclyavailabletranscriptomesequencingdata |