Cargando…

A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis

BACKGROUND: Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative s...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Runxuan, Kuo, Richard, Coulter, Max, Calixto, Cristiane P. G., Entizne, Juan Carlos, Guo, Wenbin, Marquez, Yamile, Milne, Linda, Riegler, Stefan, Matsui, Akihiro, Tanaka, Maho, Harvey, Sarah, Gao, Yubang, Wießner-Kroh, Theresa, Paniagua, Alejandro, Crespi, Martin, Denby, Katherine, Hur, Asa ben, Huq, Enamul, Jantsch, Michael, Jarmolowski, Artur, Koester, Tino, Laubinger, Sascha, Li, Qingshun Quinn, Gu, Lianfeng, Seki, Motoaki, Staiger, Dorothee, Sunkar, Ramanjulu, Szweykowska-Kulinska, Zofia, Tu, Shih-Long, Wachter, Andreas, Waugh, Robbie, Xiong, Liming, Zhang, Xiao-Ning, Conesa, Ana, Reddy, Anireddy S. N., Barta, Andrea, Kalyna, Maria, Brown, John W. S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264592/
https://www.ncbi.nlm.nih.gov/pubmed/35799267
http://dx.doi.org/10.1186/s13059-022-02711-0
_version_ 1784742996759019520
author Zhang, Runxuan
Kuo, Richard
Coulter, Max
Calixto, Cristiane P. G.
Entizne, Juan Carlos
Guo, Wenbin
Marquez, Yamile
Milne, Linda
Riegler, Stefan
Matsui, Akihiro
Tanaka, Maho
Harvey, Sarah
Gao, Yubang
Wießner-Kroh, Theresa
Paniagua, Alejandro
Crespi, Martin
Denby, Katherine
Hur, Asa ben
Huq, Enamul
Jantsch, Michael
Jarmolowski, Artur
Koester, Tino
Laubinger, Sascha
Li, Qingshun Quinn
Gu, Lianfeng
Seki, Motoaki
Staiger, Dorothee
Sunkar, Ramanjulu
Szweykowska-Kulinska, Zofia
Tu, Shih-Long
Wachter, Andreas
Waugh, Robbie
Xiong, Liming
Zhang, Xiao-Ning
Conesa, Ana
Reddy, Anireddy S. N.
Barta, Andrea
Kalyna, Maria
Brown, John W. S.
author_facet Zhang, Runxuan
Kuo, Richard
Coulter, Max
Calixto, Cristiane P. G.
Entizne, Juan Carlos
Guo, Wenbin
Marquez, Yamile
Milne, Linda
Riegler, Stefan
Matsui, Akihiro
Tanaka, Maho
Harvey, Sarah
Gao, Yubang
Wießner-Kroh, Theresa
Paniagua, Alejandro
Crespi, Martin
Denby, Katherine
Hur, Asa ben
Huq, Enamul
Jantsch, Michael
Jarmolowski, Artur
Koester, Tino
Laubinger, Sascha
Li, Qingshun Quinn
Gu, Lianfeng
Seki, Motoaki
Staiger, Dorothee
Sunkar, Ramanjulu
Szweykowska-Kulinska, Zofia
Tu, Shih-Long
Wachter, Andreas
Waugh, Robbie
Xiong, Liming
Zhang, Xiao-Ning
Conesa, Ana
Reddy, Anireddy S. N.
Barta, Andrea
Kalyna, Maria
Brown, John W. S.
author_sort Zhang, Runxuan
collection PubMed
description BACKGROUND: Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. RESULTS: We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts—twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. CONCLUSIONS: AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-022-02711-0.
format Online
Article
Text
id pubmed-9264592
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-92645922022-07-09 A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis Zhang, Runxuan Kuo, Richard Coulter, Max Calixto, Cristiane P. G. Entizne, Juan Carlos Guo, Wenbin Marquez, Yamile Milne, Linda Riegler, Stefan Matsui, Akihiro Tanaka, Maho Harvey, Sarah Gao, Yubang Wießner-Kroh, Theresa Paniagua, Alejandro Crespi, Martin Denby, Katherine Hur, Asa ben Huq, Enamul Jantsch, Michael Jarmolowski, Artur Koester, Tino Laubinger, Sascha Li, Qingshun Quinn Gu, Lianfeng Seki, Motoaki Staiger, Dorothee Sunkar, Ramanjulu Szweykowska-Kulinska, Zofia Tu, Shih-Long Wachter, Andreas Waugh, Robbie Xiong, Liming Zhang, Xiao-Ning Conesa, Ana Reddy, Anireddy S. N. Barta, Andrea Kalyna, Maria Brown, John W. S. Genome Biol Research BACKGROUND: Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. RESULTS: We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts—twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. CONCLUSIONS: AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-022-02711-0. BioMed Central 2022-07-07 /pmc/articles/PMC9264592/ /pubmed/35799267 http://dx.doi.org/10.1186/s13059-022-02711-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Zhang, Runxuan
Kuo, Richard
Coulter, Max
Calixto, Cristiane P. G.
Entizne, Juan Carlos
Guo, Wenbin
Marquez, Yamile
Milne, Linda
Riegler, Stefan
Matsui, Akihiro
Tanaka, Maho
Harvey, Sarah
Gao, Yubang
Wießner-Kroh, Theresa
Paniagua, Alejandro
Crespi, Martin
Denby, Katherine
Hur, Asa ben
Huq, Enamul
Jantsch, Michael
Jarmolowski, Artur
Koester, Tino
Laubinger, Sascha
Li, Qingshun Quinn
Gu, Lianfeng
Seki, Motoaki
Staiger, Dorothee
Sunkar, Ramanjulu
Szweykowska-Kulinska, Zofia
Tu, Shih-Long
Wachter, Andreas
Waugh, Robbie
Xiong, Liming
Zhang, Xiao-Ning
Conesa, Ana
Reddy, Anireddy S. N.
Barta, Andrea
Kalyna, Maria
Brown, John W. S.
A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis
title A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis
title_full A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis
title_fullStr A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis
title_full_unstemmed A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis
title_short A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis
title_sort high-resolution single-molecule sequencing-based arabidopsis transcriptome using novel methods of iso-seq analysis
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264592/
https://www.ncbi.nlm.nih.gov/pubmed/35799267
http://dx.doi.org/10.1186/s13059-022-02711-0
work_keys_str_mv AT zhangrunxuan ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT kuorichard ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT coultermax ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT calixtocristianepg ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT entiznejuancarlos ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT guowenbin ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT marquezyamile ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT milnelinda ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT rieglerstefan ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT matsuiakihiro ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT tanakamaho ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT harveysarah ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT gaoyubang ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT wießnerkrohtheresa ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT paniaguaalejandro ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT crespimartin ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT denbykatherine ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT hurasaben ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT huqenamul ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT jantschmichael ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT jarmolowskiartur ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT koestertino ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT laubingersascha ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT liqingshunquinn ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT gulianfeng ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT sekimotoaki ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT staigerdorothee ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT sunkarramanjulu ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT szweykowskakulinskazofia ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT tushihlong ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT wachterandreas ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT waughrobbie ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT xiongliming ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT zhangxiaoning ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT conesaana ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT reddyanireddysn ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT bartaandrea ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT kalynamaria ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT brownjohnws ahighresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT zhangrunxuan highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT kuorichard highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT coultermax highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT calixtocristianepg highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT entiznejuancarlos highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT guowenbin highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT marquezyamile highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT milnelinda highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT rieglerstefan highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT matsuiakihiro highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT tanakamaho highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT harveysarah highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT gaoyubang highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT wießnerkrohtheresa highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT paniaguaalejandro highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT crespimartin highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT denbykatherine highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT hurasaben highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT huqenamul highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT jantschmichael highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT jarmolowskiartur highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT koestertino highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT laubingersascha highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT liqingshunquinn highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT gulianfeng highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT sekimotoaki highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT staigerdorothee highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT sunkarramanjulu highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT szweykowskakulinskazofia highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT tushihlong highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT wachterandreas highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT waughrobbie highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT xiongliming highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT zhangxiaoning highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT conesaana highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT reddyanireddysn highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT bartaandrea highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT kalynamaria highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis
AT brownjohnws highresolutionsinglemoleculesequencingbasedarabidopsistranscriptomeusingnovelmethodsofisoseqanalysis