Cargando…

A high-quality annotated transcriptome of swine peripheral blood

BACKGROUND: High throughput gene expression profiling assays of peripheral blood are widely used in biomedicine, as well as in animal genetics and physiology research. Accurate, comprehensive, and precise interpretation of such high throughput assays relies on well-characterized reference genomes an...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Haibo, Smith, Timothy P.L., Nonneman, Dan J., Dekkers, Jack C.M., Tuggle, Christopher K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5483264/
https://www.ncbi.nlm.nih.gov/pubmed/28646867
http://dx.doi.org/10.1186/s12864-017-3863-7
_version_ 1783245725293871104
author Liu, Haibo
Smith, Timothy P.L.
Nonneman, Dan J.
Dekkers, Jack C.M.
Tuggle, Christopher K.
author_facet Liu, Haibo
Smith, Timothy P.L.
Nonneman, Dan J.
Dekkers, Jack C.M.
Tuggle, Christopher K.
author_sort Liu, Haibo
collection PubMed
description BACKGROUND: High throughput gene expression profiling assays of peripheral blood are widely used in biomedicine, as well as in animal genetics and physiology research. Accurate, comprehensive, and precise interpretation of such high throughput assays relies on well-characterized reference genomes and/or transcriptomes. However, neither the reference genome nor the peripheral blood transcriptome of the pig have been sufficiently assembled and annotated to support such profiling assays in this emerging biomedical model organism. We aimed to assemble published and novel RNA-seq data to provide a comprehensive, well-annotated blood transcriptome for pigs by integrating a de novo assembly with a genome-guided assembly. RESULTS: A de novo and a genome-guided transcriptome of porcine whole peripheral blood was assembled with ~162 million pairs of paired-end and ~183 million single-end, trimmed and normalized Illumina RNA-seq reads (~6 billion initial reads from 146 RNA-seq libraries) from five independent studies by using the Trinity and Cufflinks software, respectively. We then removed putative transcripts (PTs) of low confidence from both assemblies and merged the remaining PTs into an integrated transcriptome consisting of 132,928 PTs, with 126,225 (~95%) PTs from the de novo assembly and more than 91% of PTs spliced. In the integrated transcriptome, ~90% and 63% of PTs had significant sequence similarity to sequences in the NCBI NT and NR databases, respectively; 68,754 (~52%) PTs were annotated with 15,965 unique gene ontology (GO) terms; and 7618 PTs annotated with Enzyme Commission codes were assigned to 134 pathways curated by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Full exon-intron junctions of 17,528 PTs were validated by PacBio IsoSeq full-length cDNA reads from 3 other porcine tissues, NCBI pig RefSeq mRNAs and transcripts from Ensembl Sscrofa10.2 annotation. Completeness of the 5’ termini of 37,569 PTs was validated by public cap analysis of gene expression (CAGE) data. By comparison to the Ensembl transcripts, we found that (1) the deduced precursors of 54,402 PTs shared at least one intron or exon with those of 18,437 Ensembl transcripts; (2) 12,262 PTs had both longer 5’ and 3’ termini than their maximally overlapping Ensembl transcripts; and (3) 41,838 spliced PTs were totally missing from the Sscrofa10.2 annotation. Similar results were obtained when the PTs were compared to the pig NCBI RefSeq mRNA collection. CONCLUSIONS: We built, validated and annotated a comprehensive porcine blood transcriptome with significant improvement over the annotation of Ensembl Sscrofa10.2 and the pig NCBI RefSeq mRNAs, and laid a foundation for blood-based high throughput transcriptomic assays in pigs and for advancing annotation of the pig genome. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3863-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5483264
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54832642017-06-26 A high-quality annotated transcriptome of swine peripheral blood Liu, Haibo Smith, Timothy P.L. Nonneman, Dan J. Dekkers, Jack C.M. Tuggle, Christopher K. BMC Genomics Research Article BACKGROUND: High throughput gene expression profiling assays of peripheral blood are widely used in biomedicine, as well as in animal genetics and physiology research. Accurate, comprehensive, and precise interpretation of such high throughput assays relies on well-characterized reference genomes and/or transcriptomes. However, neither the reference genome nor the peripheral blood transcriptome of the pig have been sufficiently assembled and annotated to support such profiling assays in this emerging biomedical model organism. We aimed to assemble published and novel RNA-seq data to provide a comprehensive, well-annotated blood transcriptome for pigs by integrating a de novo assembly with a genome-guided assembly. RESULTS: A de novo and a genome-guided transcriptome of porcine whole peripheral blood was assembled with ~162 million pairs of paired-end and ~183 million single-end, trimmed and normalized Illumina RNA-seq reads (~6 billion initial reads from 146 RNA-seq libraries) from five independent studies by using the Trinity and Cufflinks software, respectively. We then removed putative transcripts (PTs) of low confidence from both assemblies and merged the remaining PTs into an integrated transcriptome consisting of 132,928 PTs, with 126,225 (~95%) PTs from the de novo assembly and more than 91% of PTs spliced. In the integrated transcriptome, ~90% and 63% of PTs had significant sequence similarity to sequences in the NCBI NT and NR databases, respectively; 68,754 (~52%) PTs were annotated with 15,965 unique gene ontology (GO) terms; and 7618 PTs annotated with Enzyme Commission codes were assigned to 134 pathways curated by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Full exon-intron junctions of 17,528 PTs were validated by PacBio IsoSeq full-length cDNA reads from 3 other porcine tissues, NCBI pig RefSeq mRNAs and transcripts from Ensembl Sscrofa10.2 annotation. Completeness of the 5’ termini of 37,569 PTs was validated by public cap analysis of gene expression (CAGE) data. By comparison to the Ensembl transcripts, we found that (1) the deduced precursors of 54,402 PTs shared at least one intron or exon with those of 18,437 Ensembl transcripts; (2) 12,262 PTs had both longer 5’ and 3’ termini than their maximally overlapping Ensembl transcripts; and (3) 41,838 spliced PTs were totally missing from the Sscrofa10.2 annotation. Similar results were obtained when the PTs were compared to the pig NCBI RefSeq mRNA collection. CONCLUSIONS: We built, validated and annotated a comprehensive porcine blood transcriptome with significant improvement over the annotation of Ensembl Sscrofa10.2 and the pig NCBI RefSeq mRNAs, and laid a foundation for blood-based high throughput transcriptomic assays in pigs and for advancing annotation of the pig genome. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3863-7) contains supplementary material, which is available to authorized users. BioMed Central 2017-06-24 /pmc/articles/PMC5483264/ /pubmed/28646867 http://dx.doi.org/10.1186/s12864-017-3863-7 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Liu, Haibo
Smith, Timothy P.L.
Nonneman, Dan J.
Dekkers, Jack C.M.
Tuggle, Christopher K.
A high-quality annotated transcriptome of swine peripheral blood
title A high-quality annotated transcriptome of swine peripheral blood
title_full A high-quality annotated transcriptome of swine peripheral blood
title_fullStr A high-quality annotated transcriptome of swine peripheral blood
title_full_unstemmed A high-quality annotated transcriptome of swine peripheral blood
title_short A high-quality annotated transcriptome of swine peripheral blood
title_sort high-quality annotated transcriptome of swine peripheral blood
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5483264/
https://www.ncbi.nlm.nih.gov/pubmed/28646867
http://dx.doi.org/10.1186/s12864-017-3863-7
work_keys_str_mv AT liuhaibo ahighqualityannotatedtranscriptomeofswineperipheralblood
AT smithtimothypl ahighqualityannotatedtranscriptomeofswineperipheralblood
AT nonnemandanj ahighqualityannotatedtranscriptomeofswineperipheralblood
AT dekkersjackcm ahighqualityannotatedtranscriptomeofswineperipheralblood
AT tugglechristopherk ahighqualityannotatedtranscriptomeofswineperipheralblood
AT liuhaibo highqualityannotatedtranscriptomeofswineperipheralblood
AT smithtimothypl highqualityannotatedtranscriptomeofswineperipheralblood
AT nonnemandanj highqualityannotatedtranscriptomeofswineperipheralblood
AT dekkersjackcm highqualityannotatedtranscriptomeofswineperipheralblood
AT tugglechristopherk highqualityannotatedtranscriptomeofswineperipheralblood