Cargando…
Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts
Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737654/ https://www.ncbi.nlm.nih.gov/pubmed/29048540 http://dx.doi.org/10.1093/gigascience/gix086 |
_version_ | 1783287554759458816 |
---|---|
author | Cheng, Bing Furtado, Agnelo Henry, Robert J |
author_facet | Cheng, Bing Furtado, Agnelo Henry, Robert J |
author_sort | Cheng, Bing |
collection | PubMed |
description | Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis. An isoform-level tetraploid coffee bean reference transcriptome with 95 995 distinct transcripts (average 3236 bp) was obtained. A total of 88 715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34 719 high-quality annotations. Further BLASTn analysis against NCBI non-redundant nucleotide sequences, Coffea canephora coding sequences with UTR, C. arabica ESTs, and Rfam resulted in 1213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5΄UTRs, facilitating the identification of upstream open reading frames. The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10 kilo base) were poorly annotated. LRS technology shows the limitation of previous studies. It provides an important tool to produce a reference transcriptome including more of the diversity of full-length transcripts to help understand the biology and support the genetic improvement of polyploid species such as coffee. |
format | Online Article Text |
id | pubmed-5737654 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-57376542018-01-04 Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts Cheng, Bing Furtado, Agnelo Henry, Robert J Gigascience Research Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis. An isoform-level tetraploid coffee bean reference transcriptome with 95 995 distinct transcripts (average 3236 bp) was obtained. A total of 88 715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34 719 high-quality annotations. Further BLASTn analysis against NCBI non-redundant nucleotide sequences, Coffea canephora coding sequences with UTR, C. arabica ESTs, and Rfam resulted in 1213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5΄UTRs, facilitating the identification of upstream open reading frames. The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10 kilo base) were poorly annotated. LRS technology shows the limitation of previous studies. It provides an important tool to produce a reference transcriptome including more of the diversity of full-length transcripts to help understand the biology and support the genetic improvement of polyploid species such as coffee. Oxford University Press 2017-08-30 /pmc/articles/PMC5737654/ /pubmed/29048540 http://dx.doi.org/10.1093/gigascience/gix086 Text en © The Authors 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Cheng, Bing Furtado, Agnelo Henry, Robert J Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts |
title | Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts |
title_full | Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts |
title_fullStr | Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts |
title_full_unstemmed | Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts |
title_short | Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts |
title_sort | long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737654/ https://www.ncbi.nlm.nih.gov/pubmed/29048540 http://dx.doi.org/10.1093/gigascience/gix086 |
work_keys_str_mv | AT chengbing longreadsequencingofthecoffeebeantranscriptomerevealsthediversityoffulllengthtranscripts AT furtadoagnelo longreadsequencingofthecoffeebeantranscriptomerevealsthediversityoffulllengthtranscripts AT henryrobertj longreadsequencingofthecoffeebeantranscriptomerevealsthediversityoffulllengthtranscripts |