Cargando…

The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity

The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3’ end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because tran...

Descripción completa

Detalles Bibliográficos
Autores principales: Reese, Fairlie, Williams, Brian, Balderrama-Gutierrez, Gabriela, Wyman, Dana, Çelik, Muhammed Hasan, Rebboah, Elisabeth, Rezaie, Narges, Trout, Diane, Razavi-Mohseni, Milad, Jiang, Yunzhe, Borsari, Beatrice, Morabito, Samuel, Liang, Heidi Yahan, McGill, Cassandra J., Rahmanian, Sorena, Sakr, Jasmine, Jiang, Shan, Zeng, Weihua, Carvalho, Klebea, Weimer, Annika K., Dionne, Louise A., McShane, Ariel, Bedi, Karan, Elhajjajy, Shaimae I., Upchurch, Sean, Jou, Jennifer, Youngworth, Ingrid, Gabdank, Idan, Sud, Paul, Jolanki, Otto, Strattan, J. Seth, Kagda, Meenakshi S., Snyder, Michael P., Hitz, Ben C., Moore, Jill E., Weng, Zhiping, Bennett, David, Reinholdt, Laura, Ljungman, Mats, Beer, Michael A., Gerstein, Mark B., Pachter, Lior, Guigó, Roderic, Wold, Barbara J., Mortazavi, Ali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10245583/
https://www.ncbi.nlm.nih.gov/pubmed/37292896
http://dx.doi.org/10.1101/2023.05.15.540865
_version_ 1785054891119476736
author Reese, Fairlie
Williams, Brian
Balderrama-Gutierrez, Gabriela
Wyman, Dana
Çelik, Muhammed Hasan
Rebboah, Elisabeth
Rezaie, Narges
Trout, Diane
Razavi-Mohseni, Milad
Jiang, Yunzhe
Borsari, Beatrice
Morabito, Samuel
Liang, Heidi Yahan
McGill, Cassandra J.
Rahmanian, Sorena
Sakr, Jasmine
Jiang, Shan
Zeng, Weihua
Carvalho, Klebea
Weimer, Annika K.
Dionne, Louise A.
McShane, Ariel
Bedi, Karan
Elhajjajy, Shaimae I.
Upchurch, Sean
Jou, Jennifer
Youngworth, Ingrid
Gabdank, Idan
Sud, Paul
Jolanki, Otto
Strattan, J. Seth
Kagda, Meenakshi S.
Snyder, Michael P.
Hitz, Ben C.
Moore, Jill E.
Weng, Zhiping
Bennett, David
Reinholdt, Laura
Ljungman, Mats
Beer, Michael A.
Gerstein, Mark B.
Pachter, Lior
Guigó, Roderic
Wold, Barbara J.
Mortazavi, Ali
author_facet Reese, Fairlie
Williams, Brian
Balderrama-Gutierrez, Gabriela
Wyman, Dana
Çelik, Muhammed Hasan
Rebboah, Elisabeth
Rezaie, Narges
Trout, Diane
Razavi-Mohseni, Milad
Jiang, Yunzhe
Borsari, Beatrice
Morabito, Samuel
Liang, Heidi Yahan
McGill, Cassandra J.
Rahmanian, Sorena
Sakr, Jasmine
Jiang, Shan
Zeng, Weihua
Carvalho, Klebea
Weimer, Annika K.
Dionne, Louise A.
McShane, Ariel
Bedi, Karan
Elhajjajy, Shaimae I.
Upchurch, Sean
Jou, Jennifer
Youngworth, Ingrid
Gabdank, Idan
Sud, Paul
Jolanki, Otto
Strattan, J. Seth
Kagda, Meenakshi S.
Snyder, Michael P.
Hitz, Ben C.
Moore, Jill E.
Weng, Zhiping
Bennett, David
Reinholdt, Laura
Ljungman, Mats
Beer, Michael A.
Gerstein, Mark B.
Pachter, Lior
Guigó, Roderic
Wold, Barbara J.
Mortazavi, Ali
author_sort Reese, Fairlie
collection PubMed
description The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3’ end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3’ processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.
format Online
Article
Text
id pubmed-10245583
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-102455832023-06-08 The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity Reese, Fairlie Williams, Brian Balderrama-Gutierrez, Gabriela Wyman, Dana Çelik, Muhammed Hasan Rebboah, Elisabeth Rezaie, Narges Trout, Diane Razavi-Mohseni, Milad Jiang, Yunzhe Borsari, Beatrice Morabito, Samuel Liang, Heidi Yahan McGill, Cassandra J. Rahmanian, Sorena Sakr, Jasmine Jiang, Shan Zeng, Weihua Carvalho, Klebea Weimer, Annika K. Dionne, Louise A. McShane, Ariel Bedi, Karan Elhajjajy, Shaimae I. Upchurch, Sean Jou, Jennifer Youngworth, Ingrid Gabdank, Idan Sud, Paul Jolanki, Otto Strattan, J. Seth Kagda, Meenakshi S. Snyder, Michael P. Hitz, Ben C. Moore, Jill E. Weng, Zhiping Bennett, David Reinholdt, Laura Ljungman, Mats Beer, Michael A. Gerstein, Mark B. Pachter, Lior Guigó, Roderic Wold, Barbara J. Mortazavi, Ali bioRxiv Article The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3’ end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3’ processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection. Cold Spring Harbor Laboratory 2023-05-16 /pmc/articles/PMC10245583/ /pubmed/37292896 http://dx.doi.org/10.1101/2023.05.15.540865 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Reese, Fairlie
Williams, Brian
Balderrama-Gutierrez, Gabriela
Wyman, Dana
Çelik, Muhammed Hasan
Rebboah, Elisabeth
Rezaie, Narges
Trout, Diane
Razavi-Mohseni, Milad
Jiang, Yunzhe
Borsari, Beatrice
Morabito, Samuel
Liang, Heidi Yahan
McGill, Cassandra J.
Rahmanian, Sorena
Sakr, Jasmine
Jiang, Shan
Zeng, Weihua
Carvalho, Klebea
Weimer, Annika K.
Dionne, Louise A.
McShane, Ariel
Bedi, Karan
Elhajjajy, Shaimae I.
Upchurch, Sean
Jou, Jennifer
Youngworth, Ingrid
Gabdank, Idan
Sud, Paul
Jolanki, Otto
Strattan, J. Seth
Kagda, Meenakshi S.
Snyder, Michael P.
Hitz, Ben C.
Moore, Jill E.
Weng, Zhiping
Bennett, David
Reinholdt, Laura
Ljungman, Mats
Beer, Michael A.
Gerstein, Mark B.
Pachter, Lior
Guigó, Roderic
Wold, Barbara J.
Mortazavi, Ali
The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity
title The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity
title_full The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity
title_fullStr The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity
title_full_unstemmed The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity
title_short The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity
title_sort encode4 long-read rna-seq collection reveals distinct classes of transcript structure diversity
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10245583/
https://www.ncbi.nlm.nih.gov/pubmed/37292896
http://dx.doi.org/10.1101/2023.05.15.540865
work_keys_str_mv AT reesefairlie theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT williamsbrian theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT balderramagutierrezgabriela theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT wymandana theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT celikmuhammedhasan theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT rebboahelisabeth theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT rezaienarges theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT troutdiane theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT razavimohsenimilad theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT jiangyunzhe theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT borsaribeatrice theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT morabitosamuel theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT liangheidiyahan theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT mcgillcassandraj theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT rahmaniansorena theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT sakrjasmine theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT jiangshan theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT zengweihua theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT carvalhoklebea theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT weimerannikak theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT dionnelouisea theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT mcshaneariel theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT bedikaran theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT elhajjajyshaimaei theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT upchurchsean theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT joujennifer theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT youngworthingrid theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT gabdankidan theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT sudpaul theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT jolankiotto theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT strattanjseth theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT kagdameenakshis theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT snydermichaelp theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT hitzbenc theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT moorejille theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT wengzhiping theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT bennettdavid theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT reinholdtlaura theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT ljungmanmats theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT beermichaela theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT gersteinmarkb theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT pachterlior theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT guigoroderic theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT woldbarbaraj theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT mortazaviali theencode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT reesefairlie encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT williamsbrian encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT balderramagutierrezgabriela encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT wymandana encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT celikmuhammedhasan encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT rebboahelisabeth encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT rezaienarges encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT troutdiane encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT razavimohsenimilad encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT jiangyunzhe encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT borsaribeatrice encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT morabitosamuel encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT liangheidiyahan encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT mcgillcassandraj encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT rahmaniansorena encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT sakrjasmine encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT jiangshan encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT zengweihua encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT carvalhoklebea encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT weimerannikak encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT dionnelouisea encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT mcshaneariel encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT bedikaran encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT elhajjajyshaimaei encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT upchurchsean encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT joujennifer encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT youngworthingrid encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT gabdankidan encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT sudpaul encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT jolankiotto encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT strattanjseth encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT kagdameenakshis encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT snydermichaelp encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT hitzbenc encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT moorejille encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT wengzhiping encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT bennettdavid encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT reinholdtlaura encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT ljungmanmats encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT beermichaela encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT gersteinmarkb encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT pachterlior encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT guigoroderic encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT woldbarbaraj encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity
AT mortazaviali encode4longreadrnaseqcollectionrevealsdistinctclassesoftranscriptstructurediversity