Cargando…

Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data

BACKGROUND: Fusion genes are typically identified by RNA sequencing (RNA-seq) without elucidating the causal genomic breakpoints. However, non–poly(A)-enriched RNA-seq contains large proportions of intronic reads that also span genomic breakpoints. RESULTS: We have developed an algorithm, Dr. Disco,...

Descripción completa

Detalles Bibliográficos
Autores principales: Hoogstrate, Youri, Komor, Malgorzata A, Böttcher, René, van Riet, Job, van de Werken, Harmen J G, van Lieshout, Stef, Hoffmann, Ralf, van den Broek, Evert, Bolijn, Anne S, Dits, Natasja, Sie, Daoud, van der Meer, David, Pepers, Floor, Bangma, Chris H, van Leenders, Geert J L H, Smid, Marcel, French, Pim J, Martens, John W M, van Workum, Wilbert, van der Spek, Peter J, Janssen, Bart, Caldenhoven, Eric, Rausch, Christian, de Jong, Mark, Stubbs, Andrew P, Meijer, Gerrit A, Fijneman, Remond J A, Jenster, Guido W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8673554/
https://www.ncbi.nlm.nih.gov/pubmed/34891161
http://dx.doi.org/10.1093/gigascience/giab080
_version_ 1784615476633010176
author Hoogstrate, Youri
Komor, Malgorzata A
Böttcher, René
van Riet, Job
van de Werken, Harmen J G
van Lieshout, Stef
Hoffmann, Ralf
van den Broek, Evert
Bolijn, Anne S
Dits, Natasja
Sie, Daoud
van der Meer, David
Pepers, Floor
Bangma, Chris H
van Leenders, Geert J L H
Smid, Marcel
French, Pim J
Martens, John W M
van Workum, Wilbert
van der Spek, Peter J
Janssen, Bart
Caldenhoven, Eric
Rausch, Christian
de Jong, Mark
Stubbs, Andrew P
Meijer, Gerrit A
Fijneman, Remond J A
Jenster, Guido W
author_facet Hoogstrate, Youri
Komor, Malgorzata A
Böttcher, René
van Riet, Job
van de Werken, Harmen J G
van Lieshout, Stef
Hoffmann, Ralf
van den Broek, Evert
Bolijn, Anne S
Dits, Natasja
Sie, Daoud
van der Meer, David
Pepers, Floor
Bangma, Chris H
van Leenders, Geert J L H
Smid, Marcel
French, Pim J
Martens, John W M
van Workum, Wilbert
van der Spek, Peter J
Janssen, Bart
Caldenhoven, Eric
Rausch, Christian
de Jong, Mark
Stubbs, Andrew P
Meijer, Gerrit A
Fijneman, Remond J A
Jenster, Guido W
author_sort Hoogstrate, Youri
collection PubMed
description BACKGROUND: Fusion genes are typically identified by RNA sequencing (RNA-seq) without elucidating the causal genomic breakpoints. However, non–poly(A)-enriched RNA-seq contains large proportions of intronic reads that also span genomic breakpoints. RESULTS: We have developed an algorithm, Dr. Disco, that searches for fusion transcripts by taking an entire reference genome into account as search space. This includes exons but also introns, intergenic regions, and sequences that do not meet splice junction motifs. Using 1,275 RNA-seq samples, we investigated to what extent genomic breakpoints can be extracted from RNA-seq data and their implications regarding poly(A)-enriched and ribosomal RNA–minus RNA-seq data. Comparison with whole-genome sequencing data revealed that most genomic breakpoints are not, or minimally, transcribed while, in contrast, the genomic breakpoints of all 32 TMPRSS2-ERG–positive tumours were present at RNA level. We also revealed tumours in which the ERG breakpoint was located before ERG, which co-existed with additional deletions and messenger RNA that incorporated intergenic cryptic exons. In breast cancer we identified rearrangement hot spots near CCND1 and in glioma near CDK4 and MDM2 and could directly associate this with increased expression. Furthermore, in all datasets we find fusions to intergenic regions, often spanning multiple cryptic exons that potentially encode neo-antigens. Thus, fusion transcripts other than classical gene-to-gene fusions are prominently present and can be identified using RNA-seq. CONCLUSION: By using the full potential of non–poly(A)-enriched RNA-seq data, sophisticated analysis can reliably identify expressed genomic breakpoints and their transcriptional effects.
format Online
Article
Text
id pubmed-8673554
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-86735542021-12-16 Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data Hoogstrate, Youri Komor, Malgorzata A Böttcher, René van Riet, Job van de Werken, Harmen J G van Lieshout, Stef Hoffmann, Ralf van den Broek, Evert Bolijn, Anne S Dits, Natasja Sie, Daoud van der Meer, David Pepers, Floor Bangma, Chris H van Leenders, Geert J L H Smid, Marcel French, Pim J Martens, John W M van Workum, Wilbert van der Spek, Peter J Janssen, Bart Caldenhoven, Eric Rausch, Christian de Jong, Mark Stubbs, Andrew P Meijer, Gerrit A Fijneman, Remond J A Jenster, Guido W Gigascience Technical Note BACKGROUND: Fusion genes are typically identified by RNA sequencing (RNA-seq) without elucidating the causal genomic breakpoints. However, non–poly(A)-enriched RNA-seq contains large proportions of intronic reads that also span genomic breakpoints. RESULTS: We have developed an algorithm, Dr. Disco, that searches for fusion transcripts by taking an entire reference genome into account as search space. This includes exons but also introns, intergenic regions, and sequences that do not meet splice junction motifs. Using 1,275 RNA-seq samples, we investigated to what extent genomic breakpoints can be extracted from RNA-seq data and their implications regarding poly(A)-enriched and ribosomal RNA–minus RNA-seq data. Comparison with whole-genome sequencing data revealed that most genomic breakpoints are not, or minimally, transcribed while, in contrast, the genomic breakpoints of all 32 TMPRSS2-ERG–positive tumours were present at RNA level. We also revealed tumours in which the ERG breakpoint was located before ERG, which co-existed with additional deletions and messenger RNA that incorporated intergenic cryptic exons. In breast cancer we identified rearrangement hot spots near CCND1 and in glioma near CDK4 and MDM2 and could directly associate this with increased expression. Furthermore, in all datasets we find fusions to intergenic regions, often spanning multiple cryptic exons that potentially encode neo-antigens. Thus, fusion transcripts other than classical gene-to-gene fusions are prominently present and can be identified using RNA-seq. CONCLUSION: By using the full potential of non–poly(A)-enriched RNA-seq data, sophisticated analysis can reliably identify expressed genomic breakpoints and their transcriptional effects. Oxford University Press 2021-12-09 /pmc/articles/PMC8673554/ /pubmed/34891161 http://dx.doi.org/10.1093/gigascience/giab080 Text en © The Author(s) 2021. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Hoogstrate, Youri
Komor, Malgorzata A
Böttcher, René
van Riet, Job
van de Werken, Harmen J G
van Lieshout, Stef
Hoffmann, Ralf
van den Broek, Evert
Bolijn, Anne S
Dits, Natasja
Sie, Daoud
van der Meer, David
Pepers, Floor
Bangma, Chris H
van Leenders, Geert J L H
Smid, Marcel
French, Pim J
Martens, John W M
van Workum, Wilbert
van der Spek, Peter J
Janssen, Bart
Caldenhoven, Eric
Rausch, Christian
de Jong, Mark
Stubbs, Andrew P
Meijer, Gerrit A
Fijneman, Remond J A
Jenster, Guido W
Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data
title Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data
title_full Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data
title_fullStr Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data
title_full_unstemmed Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data
title_short Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data
title_sort fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal rna–minus rna sequencing data
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8673554/
https://www.ncbi.nlm.nih.gov/pubmed/34891161
http://dx.doi.org/10.1093/gigascience/giab080
work_keys_str_mv AT hoogstrateyouri fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT komormalgorzataa fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT bottcherrene fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT vanrietjob fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT vandewerkenharmenjg fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT vanlieshoutstef fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT hoffmannralf fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT vandenbroekevert fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT bolijnannes fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT ditsnatasja fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT siedaoud fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT vandermeerdavid fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT pepersfloor fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT bangmachrish fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT vanleendersgeertjlh fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT smidmarcel fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT frenchpimj fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT martensjohnwm fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT vanworkumwilbert fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT vanderspekpeterj fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT janssenbart fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT caldenhoveneric fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT rauschchristian fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT dejongmark fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT stubbsandrewp fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT meijergerrita fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT fijnemanremondja fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata
AT jensterguidow fusiontranscriptsandtheirgenomicbreakpointsinpolyadenylatedandribosomalrnaminusrnasequencingdata