Cargando…

Optimization of next‐generation sequencing transcriptome annotation for species lacking sequenced genomes

Next‐generation sequencing methods, such as RNA‐seq, have permitted the exploration of gene expression in a range of organisms which have been studied in ecological contexts but lack a sequenced genome. However, the efficacy and accuracy of RNA‐seq annotation methods using reference genomes from rel...

Descripción completa

Detalles Bibliográficos
Autores principales: Ockendon, Nina F., O'Connell, Lauren A., Bush, Stephen J., Monzón‐Sandoval, Jimena, Barnes, Holly, Székely, Tamás, Hofmann, Hans A., Dorus, Steve, Urrutia, Araxi O.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4982090/
https://www.ncbi.nlm.nih.gov/pubmed/26358618
http://dx.doi.org/10.1111/1755-0998.12465
_version_ 1782447710926798848
author Ockendon, Nina F.
O'Connell, Lauren A.
Bush, Stephen J.
Monzón‐Sandoval, Jimena
Barnes, Holly
Székely, Tamás
Hofmann, Hans A.
Dorus, Steve
Urrutia, Araxi O.
author_facet Ockendon, Nina F.
O'Connell, Lauren A.
Bush, Stephen J.
Monzón‐Sandoval, Jimena
Barnes, Holly
Székely, Tamás
Hofmann, Hans A.
Dorus, Steve
Urrutia, Araxi O.
author_sort Ockendon, Nina F.
collection PubMed
description Next‐generation sequencing methods, such as RNA‐seq, have permitted the exploration of gene expression in a range of organisms which have been studied in ecological contexts but lack a sequenced genome. However, the efficacy and accuracy of RNA‐seq annotation methods using reference genomes from related species have yet to be robustly characterized. Here we conduct a comprehensive power analysis employing RNA‐seq data from Drosophila melanogaster in conjunction with 11 additional genomes from related Drosophila species to compare annotation methods and quantify the impact of evolutionary divergence between transcriptome and the reference genome. Our analyses demonstrate that, regardless of the level of sequence divergence, direct genome mapping (DGM), where transcript short reads are aligned directly to the reference genome, significantly outperforms the widely used de novo and guided assembly‐based methods in both the quantity and accuracy of gene detection. Our analysis also reveals that DGM recovers a more representative profile of Gene Ontology functional categories, which are often used to interpret emergent patterns in genomewide expression analyses. Lastly, analysis of available primate RNA‐seq data demonstrates the applicability of our observations across diverse taxa. Our quantification of annotation accuracy and reduced gene detection associated with sequence divergence thus provides empirically derived guidelines for the design of future gene expression studies in species without sequenced genomes.
format Online
Article
Text
id pubmed-4982090
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-49820902016-08-26 Optimization of next‐generation sequencing transcriptome annotation for species lacking sequenced genomes Ockendon, Nina F. O'Connell, Lauren A. Bush, Stephen J. Monzón‐Sandoval, Jimena Barnes, Holly Székely, Tamás Hofmann, Hans A. Dorus, Steve Urrutia, Araxi O. Mol Ecol Resour RESOURCE ARTICLES Next‐generation sequencing methods, such as RNA‐seq, have permitted the exploration of gene expression in a range of organisms which have been studied in ecological contexts but lack a sequenced genome. However, the efficacy and accuracy of RNA‐seq annotation methods using reference genomes from related species have yet to be robustly characterized. Here we conduct a comprehensive power analysis employing RNA‐seq data from Drosophila melanogaster in conjunction with 11 additional genomes from related Drosophila species to compare annotation methods and quantify the impact of evolutionary divergence between transcriptome and the reference genome. Our analyses demonstrate that, regardless of the level of sequence divergence, direct genome mapping (DGM), where transcript short reads are aligned directly to the reference genome, significantly outperforms the widely used de novo and guided assembly‐based methods in both the quantity and accuracy of gene detection. Our analysis also reveals that DGM recovers a more representative profile of Gene Ontology functional categories, which are often used to interpret emergent patterns in genomewide expression analyses. Lastly, analysis of available primate RNA‐seq data demonstrates the applicability of our observations across diverse taxa. Our quantification of annotation accuracy and reduced gene detection associated with sequence divergence thus provides empirically derived guidelines for the design of future gene expression studies in species without sequenced genomes. John Wiley and Sons Inc. 2015-10-14 2016-03 /pmc/articles/PMC4982090/ /pubmed/26358618 http://dx.doi.org/10.1111/1755-0998.12465 Text en © 2015 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd. This is an open access article under the terms of the Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle RESOURCE ARTICLES
Ockendon, Nina F.
O'Connell, Lauren A.
Bush, Stephen J.
Monzón‐Sandoval, Jimena
Barnes, Holly
Székely, Tamás
Hofmann, Hans A.
Dorus, Steve
Urrutia, Araxi O.
Optimization of next‐generation sequencing transcriptome annotation for species lacking sequenced genomes
title Optimization of next‐generation sequencing transcriptome annotation for species lacking sequenced genomes
title_full Optimization of next‐generation sequencing transcriptome annotation for species lacking sequenced genomes
title_fullStr Optimization of next‐generation sequencing transcriptome annotation for species lacking sequenced genomes
title_full_unstemmed Optimization of next‐generation sequencing transcriptome annotation for species lacking sequenced genomes
title_short Optimization of next‐generation sequencing transcriptome annotation for species lacking sequenced genomes
title_sort optimization of next‐generation sequencing transcriptome annotation for species lacking sequenced genomes
topic RESOURCE ARTICLES
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4982090/
https://www.ncbi.nlm.nih.gov/pubmed/26358618
http://dx.doi.org/10.1111/1755-0998.12465
work_keys_str_mv AT ockendonninaf optimizationofnextgenerationsequencingtranscriptomeannotationforspecieslackingsequencedgenomes
AT oconnelllaurena optimizationofnextgenerationsequencingtranscriptomeannotationforspecieslackingsequencedgenomes
AT bushstephenj optimizationofnextgenerationsequencingtranscriptomeannotationforspecieslackingsequencedgenomes
AT monzonsandovaljimena optimizationofnextgenerationsequencingtranscriptomeannotationforspecieslackingsequencedgenomes
AT barnesholly optimizationofnextgenerationsequencingtranscriptomeannotationforspecieslackingsequencedgenomes
AT szekelytamas optimizationofnextgenerationsequencingtranscriptomeannotationforspecieslackingsequencedgenomes
AT hofmannhansa optimizationofnextgenerationsequencingtranscriptomeannotationforspecieslackingsequencedgenomes
AT dorussteve optimizationofnextgenerationsequencingtranscriptomeannotationforspecieslackingsequencedgenomes
AT urrutiaaraxio optimizationofnextgenerationsequencingtranscriptomeannotationforspecieslackingsequencedgenomes