Cargando…

Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows

BACKGROUND: Long read sequencing allows the analysis of full-length transcripts in plants without the challenges of reliable transcriptome assembly. Long read sequencing of transcripts from plant genomes has often utilized sized transcript libraries. However, the value of including libraries of diff...

Descripción completa

Detalles Bibliográficos
Autores principales: Al-Dossary, Othman, Furtado, Agnelo, KharabianMasouleh, Ardashir, Alsubaie, Bader, Al-Mssallem, Ibrahim, Henry, Robert J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10589961/
https://www.ncbi.nlm.nih.gov/pubmed/37865785
http://dx.doi.org/10.1186/s13007-023-01091-1
_version_ 1785123896325832704
author Al-Dossary, Othman
Furtado, Agnelo
KharabianMasouleh, Ardashir
Alsubaie, Bader
Al-Mssallem, Ibrahim
Henry, Robert J.
author_facet Al-Dossary, Othman
Furtado, Agnelo
KharabianMasouleh, Ardashir
Alsubaie, Bader
Al-Mssallem, Ibrahim
Henry, Robert J.
author_sort Al-Dossary, Othman
collection PubMed
description BACKGROUND: Long read sequencing allows the analysis of full-length transcripts in plants without the challenges of reliable transcriptome assembly. Long read sequencing of transcripts from plant genomes has often utilized sized transcript libraries. However, the value of including libraries of differing sizes has not been established. METHODS: A comprehensive transcriptome of the leaves of Jojoba (Simmondsia chinensis) was generated from two different PacBio library preparations: standard workflow (SW) and long workflow (LW). RESULTS: The importance of using both transcript groups in the analysis was demonstrated by the high proportion of unique sequences (74.6%) that were not shared between the groups. A total of 37.8% longer transcripts were only detected in the long dataset. The completeness of the combined transcriptome was indicated by the presence of 98.7% of genes predicted in the jojoba male reference genome. The high coverage of the transcriptome was further confirmed by BUSCO analysis showing the presence of 96.9% of the genes from the core viridiplantae_odb10 lineage. The high-quality isoforms post Cd-Hit merged dataset of the two workflows had a total of 167,866 isoforms. Most of the transcript isoforms were protein-coding sequences (71.7%) containing open reading frames (ORFs) ≥ 100 amino acids (aa). Alternative splicing and intron retention were the basis of most transcript diversity when analysed at the whole genome level and by specific analysis of the apetala2 gene families. CONCLUSION: This suggests the need to specifically target the capture of longer transcripts to provide more comprehensive genome coverage in plant transcriptome analysis and reveal the high level of alternative splicing. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13007-023-01091-1.
format Online
Article
Text
id pubmed-10589961
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-105899612023-10-22 Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows Al-Dossary, Othman Furtado, Agnelo KharabianMasouleh, Ardashir Alsubaie, Bader Al-Mssallem, Ibrahim Henry, Robert J. Plant Methods Methodology BACKGROUND: Long read sequencing allows the analysis of full-length transcripts in plants without the challenges of reliable transcriptome assembly. Long read sequencing of transcripts from plant genomes has often utilized sized transcript libraries. However, the value of including libraries of differing sizes has not been established. METHODS: A comprehensive transcriptome of the leaves of Jojoba (Simmondsia chinensis) was generated from two different PacBio library preparations: standard workflow (SW) and long workflow (LW). RESULTS: The importance of using both transcript groups in the analysis was demonstrated by the high proportion of unique sequences (74.6%) that were not shared between the groups. A total of 37.8% longer transcripts were only detected in the long dataset. The completeness of the combined transcriptome was indicated by the presence of 98.7% of genes predicted in the jojoba male reference genome. The high coverage of the transcriptome was further confirmed by BUSCO analysis showing the presence of 96.9% of the genes from the core viridiplantae_odb10 lineage. The high-quality isoforms post Cd-Hit merged dataset of the two workflows had a total of 167,866 isoforms. Most of the transcript isoforms were protein-coding sequences (71.7%) containing open reading frames (ORFs) ≥ 100 amino acids (aa). Alternative splicing and intron retention were the basis of most transcript diversity when analysed at the whole genome level and by specific analysis of the apetala2 gene families. CONCLUSION: This suggests the need to specifically target the capture of longer transcripts to provide more comprehensive genome coverage in plant transcriptome analysis and reveal the high level of alternative splicing. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13007-023-01091-1. BioMed Central 2023-10-21 /pmc/articles/PMC10589961/ /pubmed/37865785 http://dx.doi.org/10.1186/s13007-023-01091-1 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
Al-Dossary, Othman
Furtado, Agnelo
KharabianMasouleh, Ardashir
Alsubaie, Bader
Al-Mssallem, Ibrahim
Henry, Robert J.
Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows
title Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows
title_full Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows
title_fullStr Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows
title_full_unstemmed Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows
title_short Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows
title_sort long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10589961/
https://www.ncbi.nlm.nih.gov/pubmed/37865785
http://dx.doi.org/10.1186/s13007-023-01091-1
work_keys_str_mv AT aldossaryothman longreadsequencingtorevealthefullcomplexityofaplanttranscriptomebytargetingbothstandardandlongworkflows
AT furtadoagnelo longreadsequencingtorevealthefullcomplexityofaplanttranscriptomebytargetingbothstandardandlongworkflows
AT kharabianmasoulehardashir longreadsequencingtorevealthefullcomplexityofaplanttranscriptomebytargetingbothstandardandlongworkflows
AT alsubaiebader longreadsequencingtorevealthefullcomplexityofaplanttranscriptomebytargetingbothstandardandlongworkflows
AT almssallemibrahim longreadsequencingtorevealthefullcomplexityofaplanttranscriptomebytargetingbothstandardandlongworkflows
AT henryrobertj longreadsequencingtorevealthefullcomplexityofaplanttranscriptomebytargetingbothstandardandlongworkflows