Cargando…
Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data
BACKGROUND: Fusion genes are typically identified by RNA sequencing (RNA-seq) without elucidating the causal genomic breakpoints. However, non–poly(A)-enriched RNA-seq contains large proportions of intronic reads that also span genomic breakpoints. RESULTS: We have developed an algorithm, Dr. Disco,...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8673554/ https://www.ncbi.nlm.nih.gov/pubmed/34891161 http://dx.doi.org/10.1093/gigascience/giab080 |
_version_ | 1784615476633010176 |
---|---|
author | Hoogstrate, Youri Komor, Malgorzata A Böttcher, René van Riet, Job van de Werken, Harmen J G van Lieshout, Stef Hoffmann, Ralf van den Broek, Evert Bolijn, Anne S Dits, Natasja Sie, Daoud van der Meer, David Pepers, Floor Bangma, Chris H van Leenders, Geert J L H Smid, Marcel French, Pim J Martens, John W M van Workum, Wilbert van der Spek, Peter J Janssen, Bart Caldenhoven, Eric Rausch, Christian de Jong, Mark Stubbs, Andrew P Meijer, Gerrit A Fijneman, Remond J A Jenster, Guido W |
author_facet | Hoogstrate, Youri Komor, Malgorzata A Böttcher, René van Riet, Job van de Werken, Harmen J G van Lieshout, Stef Hoffmann, Ralf van den Broek, Evert Bolijn, Anne S Dits, Natasja Sie, Daoud van der Meer, David Pepers, Floor Bangma, Chris H van Leenders, Geert J L H Smid, Marcel French, Pim J Martens, John W M van Workum, Wilbert van der Spek, Peter J Janssen, Bart Caldenhoven, Eric Rausch, Christian de Jong, Mark Stubbs, Andrew P Meijer, Gerrit A Fijneman, Remond J A Jenster, Guido W |
author_sort | Hoogstrate, Youri |
collection | PubMed |
description | BACKGROUND: Fusion genes are typically identified by RNA sequencing (RNA-seq) without elucidating the causal genomic breakpoints. However, non–poly(A)-enriched RNA-seq contains large proportions of intronic reads that also span genomic breakpoints. RESULTS: We have developed an algorithm, Dr. Disco, that searches for fusion transcripts by taking an entire reference genome into account as search space. This includes exons but also introns, intergenic regions, and sequences that do not meet splice junction motifs. Using 1,275 RNA-seq samples, we investigated to what extent genomic breakpoints can be extracted from RNA-seq data and their implications regarding poly(A)-enriched and ribosomal RNA–minus RNA-seq data. Comparison with whole-genome sequencing data revealed that most genomic breakpoints are not, or minimally, transcribed while, in contrast, the genomic breakpoints of all 32 TMPRSS2-ERG–positive tumours were present at RNA level. We also revealed tumours in which the ERG breakpoint was located before ERG, which co-existed with additional deletions and messenger RNA that incorporated intergenic cryptic exons. In breast cancer we identified rearrangement hot spots near CCND1 and in glioma near CDK4 and MDM2 and could directly associate this with increased expression. Furthermore, in all datasets we find fusions to intergenic regions, often spanning multiple cryptic exons that potentially encode neo-antigens. Thus, fusion transcripts other than classical gene-to-gene fusions are prominently present and can be identified using RNA-seq. CONCLUSION: By using the full potential of non–poly(A)-enriched RNA-seq data, sophisticated analysis can reliably identify expressed genomic breakpoints and their transcriptional effects. |
format | Online Article Text |
id | pubmed-8673554 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-86735542021-12-16 Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data Hoogstrate, Youri Komor, Malgorzata A Böttcher, René van Riet, Job van de Werken, Harmen J G van Lieshout, Stef Hoffmann, Ralf van den Broek, Evert Bolijn, Anne S Dits, Natasja Sie, Daoud van der Meer, David Pepers, Floor Bangma, Chris H van Leenders, Geert J L H Smid, Marcel French, Pim J Martens, John W M van Workum, Wilbert van der Spek, Peter J Janssen, Bart Caldenhoven, Eric Rausch, Christian de Jong, Mark Stubbs, Andrew P Meijer, Gerrit A Fijneman, Remond J A Jenster, Guido W Gigascience Technical Note BACKGROUND: Fusion genes are typically identified by RNA sequencing (RNA-seq) without elucidating the causal genomic breakpoints. However, non–poly(A)-enriched RNA-seq contains large proportions of intronic reads that also span genomic breakpoints. RESULTS: We have developed an algorithm, Dr. Disco, that searches for fusion transcripts by taking an entire reference genome into account as search space. This includes exons but also introns, intergenic regions, and sequences that do not meet splice junction motifs. Using 1,275 RNA-seq samples, we investigated to what extent genomic breakpoints can be extracted from RNA-seq data and their implications regarding poly(A)-enriched and ribosomal RNA–minus RNA-seq data. Comparison with whole-genome sequencing data revealed that most genomic breakpoints are not, or minimally, transcribed while, in contrast, the genomic breakpoints of all 32 TMPRSS2-ERG–positive tumours were present at RNA level. We also revealed tumours in which the ERG breakpoint was located before ERG, which co-existed with additional deletions and messenger RNA that incorporated intergenic cryptic exons. In breast cancer we identified rearrangement hot spots near CCND1 and in glioma near CDK4 and MDM2 and could directly associate this with increased expression. Furthermore, in all datasets we find fusions to intergenic regions, often spanning multiple cryptic exons that potentially encode neo-antigens. Thus, fusion transcripts other than classical gene-to-gene fusions are prominently present and can be identified using RNA-seq. CONCLUSION: By using the full potential of non–poly(A)-enriched RNA-seq data, sophisticated analysis can reliably identify expressed genomic breakpoints and their transcriptional effects. Oxford University Press 2021-12-09 /pmc/articles/PMC8673554/ /pubmed/34891161 http://dx.doi.org/10.1093/gigascience/giab080 Text en © The Author(s) 2021. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Technical Note Hoogstrate, Youri Komor, Malgorzata A Böttcher, René van Riet, Job van de Werken, Harmen J G van Lieshout, Stef Hoffmann, Ralf van den Broek, Evert Bolijn, Anne S Dits, Natasja Sie, Daoud van der Meer, David Pepers, Floor Bangma, Chris H van Leenders, Geert J L H Smid, Marcel French, Pim J Martens, John W M van Workum, Wilbert van der Spek, Peter J Janssen, Bart Caldenhoven, Eric Rausch, Christian de Jong, Mark Stubbs, Andrew P Meijer, Gerrit A Fijneman, Remond J A Jenster, Guido W Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data |
title | Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data |
title_full | Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data |
title_fullStr | Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data |
title_full_unstemmed | Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data |
title_short | Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data |
title_sort | fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal rna–minus rna sequencing data |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8673554/ https://www.ncbi.nlm.nih.gov/pubmed/34891161 http://dx.doi.org/10.1093/gigascience/giab080 |
work_keys_str_mv | AT hoogstrateyouri fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT komormalgorzataa fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT bottcherrene fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT vanrietjob fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT vandewerkenharmenjg fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT vanlieshoutstef fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT hoffmannralf fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT vandenbroekevert fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT bolijnannes fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT ditsnatasja fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT siedaoud fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT vandermeerdavid fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT pepersfloor fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT bangmachrish fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT vanleendersgeertjlh fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT smidmarcel fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT frenchpimj fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT martensjohnwm fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT vanworkumwilbert fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT vanderspekpeterj fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT janssenbart fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT caldenhoveneric fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT rauschchristian fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT dejongmark fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT stubbsandrewp fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT meijergerrita fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT fijnemanremondja fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata AT jensterguidow fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata |