Cargando…
The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity
The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3’ end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because tran...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10245583/ https://www.ncbi.nlm.nih.gov/pubmed/37292896 http://dx.doi.org/10.1101/2023.05.15.540865 |
_version_ | 1785054891119476736 |
---|---|
author | Reese, Fairlie Williams, Brian Balderrama-Gutierrez, Gabriela Wyman, Dana Çelik, Muhammed Hasan Rebboah, Elisabeth Rezaie, Narges Trout, Diane Razavi-Mohseni, Milad Jiang, Yunzhe Borsari, Beatrice Morabito, Samuel Liang, Heidi Yahan McGill, Cassandra J. Rahmanian, Sorena Sakr, Jasmine Jiang, Shan Zeng, Weihua Carvalho, Klebea Weimer, Annika K. Dionne, Louise A. McShane, Ariel Bedi, Karan Elhajjajy, Shaimae I. Upchurch, Sean Jou, Jennifer Youngworth, Ingrid Gabdank, Idan Sud, Paul Jolanki, Otto Strattan, J. Seth Kagda, Meenakshi S. Snyder, Michael P. Hitz, Ben C. Moore, Jill E. Weng, Zhiping Bennett, David Reinholdt, Laura Ljungman, Mats Beer, Michael A. Gerstein, Mark B. Pachter, Lior Guigó, Roderic Wold, Barbara J. Mortazavi, Ali |
author_facet | Reese, Fairlie Williams, Brian Balderrama-Gutierrez, Gabriela Wyman, Dana Çelik, Muhammed Hasan Rebboah, Elisabeth Rezaie, Narges Trout, Diane Razavi-Mohseni, Milad Jiang, Yunzhe Borsari, Beatrice Morabito, Samuel Liang, Heidi Yahan McGill, Cassandra J. Rahmanian, Sorena Sakr, Jasmine Jiang, Shan Zeng, Weihua Carvalho, Klebea Weimer, Annika K. Dionne, Louise A. McShane, Ariel Bedi, Karan Elhajjajy, Shaimae I. Upchurch, Sean Jou, Jennifer Youngworth, Ingrid Gabdank, Idan Sud, Paul Jolanki, Otto Strattan, J. Seth Kagda, Meenakshi S. Snyder, Michael P. Hitz, Ben C. Moore, Jill E. Weng, Zhiping Bennett, David Reinholdt, Laura Ljungman, Mats Beer, Michael A. Gerstein, Mark B. Pachter, Lior Guigó, Roderic Wold, Barbara J. Mortazavi, Ali |
author_sort | Reese, Fairlie |
collection | PubMed |
description | The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3’ end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3’ processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection. |
format | Online Article Text |
id | pubmed-10245583 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-102455832023-06-08 The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity Reese, Fairlie Williams, Brian Balderrama-Gutierrez, Gabriela Wyman, Dana Çelik, Muhammed Hasan Rebboah, Elisabeth Rezaie, Narges Trout, Diane Razavi-Mohseni, Milad Jiang, Yunzhe Borsari, Beatrice Morabito, Samuel Liang, Heidi Yahan McGill, Cassandra J. Rahmanian, Sorena Sakr, Jasmine Jiang, Shan Zeng, Weihua Carvalho, Klebea Weimer, Annika K. Dionne, Louise A. McShane, Ariel Bedi, Karan Elhajjajy, Shaimae I. Upchurch, Sean Jou, Jennifer Youngworth, Ingrid Gabdank, Idan Sud, Paul Jolanki, Otto Strattan, J. Seth Kagda, Meenakshi S. Snyder, Michael P. Hitz, Ben C. Moore, Jill E. Weng, Zhiping Bennett, David Reinholdt, Laura Ljungman, Mats Beer, Michael A. Gerstein, Mark B. Pachter, Lior Guigó, Roderic Wold, Barbara J. Mortazavi, Ali bioRxiv Article The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3’ end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3’ processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection. Cold Spring Harbor Laboratory 2023-05-16 /pmc/articles/PMC10245583/ /pubmed/37292896 http://dx.doi.org/10.1101/2023.05.15.540865 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Reese, Fairlie Williams, Brian Balderrama-Gutierrez, Gabriela Wyman, Dana Çelik, Muhammed Hasan Rebboah, Elisabeth Rezaie, Narges Trout, Diane Razavi-Mohseni, Milad Jiang, Yunzhe Borsari, Beatrice Morabito, Samuel Liang, Heidi Yahan McGill, Cassandra J. Rahmanian, Sorena Sakr, Jasmine Jiang, Shan Zeng, Weihua Carvalho, Klebea Weimer, Annika K. Dionne, Louise A. McShane, Ariel Bedi, Karan Elhajjajy, Shaimae I. Upchurch, Sean Jou, Jennifer Youngworth, Ingrid Gabdank, Idan Sud, Paul Jolanki, Otto Strattan, J. Seth Kagda, Meenakshi S. Snyder, Michael P. Hitz, Ben C. Moore, Jill E. Weng, Zhiping Bennett, David Reinholdt, Laura Ljungman, Mats Beer, Michael A. Gerstein, Mark B. Pachter, Lior Guigó, Roderic Wold, Barbara J. Mortazavi, Ali The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity |
title | The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity |
title_full | The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity |
title_fullStr | The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity |
title_full_unstemmed | The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity |
title_short | The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity |
title_sort | encode4 long-read rna-seq collection reveals distinct classes of transcript structure diversity |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10245583/ https://www.ncbi.nlm.nih.gov/pubmed/37292896 http://dx.doi.org/10.1101/2023.05.15.540865 |
work_keys_str_mv | AT reesefairlie theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT williamsbrian theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT balderramagutierrezgabriela theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT wymandana theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT celikmuhammedhasan theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT rebboahelisabeth theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT rezaienarges theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT troutdiane theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT razavimohsenimilad theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT jiangyunzhe theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT borsaribeatrice theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT morabitosamuel theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT liangheidiyahan theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT mcgillcassandraj theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT rahmaniansorena theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT sakrjasmine theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT jiangshan theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT zengweihua theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT carvalhoklebea theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT weimerannikak theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT dionnelouisea theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT mcshaneariel theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT bedikaran theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT elhajjajyshaimaei theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT upchurchsean theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT joujennifer theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT youngworthingrid theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT gabdankidan theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT sudpaul theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT jolankiotto theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT strattanjseth theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT kagdameenakshis theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT snydermichaelp theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT hitzbenc theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT moorejille theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT wengzhiping theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT bennettdavid theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT reinholdtlaura theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT ljungmanmats theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT beermichaela theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT gersteinmarkb theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT pachterlior theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT guigoroderic theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT woldbarbaraj theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT mortazaviali theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT reesefairlie encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT williamsbrian encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT balderramagutierrezgabriela encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT wymandana encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT celikmuhammedhasan encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT rebboahelisabeth encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT rezaienarges encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT troutdiane encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT razavimohsenimilad encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT jiangyunzhe encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT borsaribeatrice encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT morabitosamuel encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT liangheidiyahan encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT mcgillcassandraj encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT rahmaniansorena encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT sakrjasmine encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT jiangshan encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT zengweihua encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT carvalhoklebea encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT weimerannikak encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT dionnelouisea encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT mcshaneariel encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT bedikaran encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT elhajjajyshaimaei encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT upchurchsean encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT joujennifer encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT youngworthingrid encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT gabdankidan encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT sudpaul encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT jolankiotto encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT strattanjseth encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT kagdameenakshis encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT snydermichaelp encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT hitzbenc encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT moorejille encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT wengzhiping encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT bennettdavid encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT reinholdtlaura encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT ljungmanmats encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT beermichaela encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT gersteinmarkb encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT pachterlior encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT guigoroderic encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT woldbarbaraj encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity AT mortazaviali encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity |