Cargando…

A robust (re-)annotation approach to generate unbiased mapping references for RNA-seq-based analyses of differential expression across closely related species

BACKGROUND: RNA-seq based on short reads generated by next generation sequencing technologies has become the main approach to study differential gene expression. Until now, the main applications of this technique have been to study the variation of gene expression in a whole organism, tissue or cell...

Descripción completa

Detalles Bibliográficos
Autores principales: Torres-Oliva, Montserrat, Almudi, Isabel, McGregor, Alistair P., Posnien, Nico
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877740/
https://www.ncbi.nlm.nih.gov/pubmed/27220689
http://dx.doi.org/10.1186/s12864-016-2646-x
_version_ 1782433435340505088
author Torres-Oliva, Montserrat
Almudi, Isabel
McGregor, Alistair P.
Posnien, Nico
author_facet Torres-Oliva, Montserrat
Almudi, Isabel
McGregor, Alistair P.
Posnien, Nico
author_sort Torres-Oliva, Montserrat
collection PubMed
description BACKGROUND: RNA-seq based on short reads generated by next generation sequencing technologies has become the main approach to study differential gene expression. Until now, the main applications of this technique have been to study the variation of gene expression in a whole organism, tissue or cell type under different conditions or at different developmental stages. However, RNA-seq also has a great potential to be used in evolutionary studies to investigate gene expression divergence in closely related species. RESULTS: We show that the published genomes and annotations of the three closely related Drosophila species D. melanogaster, D. simulans and D. mauritiana have limitations for inter-specific gene expression studies. This is due to missing gene models in at least one of the genome annotations, unclear orthology assignments and significant gene length differences in the different species. A comprehensive evaluation of four statistical frameworks (DESeq2, DESeq2 with length correction, RPKM-limma and RPKM-voom-limma) shows that none of these methods sufficiently accounts for inter-specific gene length differences, which inevitably results in false positive candidate genes. We propose that published reference genomes should be re-annotated before using them as references for RNA-seq experiments to include as many genes as possible and to account for a potential length bias. We present a straight-forward reciprocal re-annotation pipeline that allows to reliably compare the expression for nearly all genes annotated in D. melanogaster. CONCLUSIONS: We conclude that our reciprocal re-annotation of previously published genomes facilitates the analysis of significantly more genes in an inter-specific differential gene expression study. We propose that the established pipeline can easily be applied to re-annotate other genomes of closely related animals and plants to improve comparative expression analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2646-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4877740
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48777402016-05-25 A robust (re-)annotation approach to generate unbiased mapping references for RNA-seq-based analyses of differential expression across closely related species Torres-Oliva, Montserrat Almudi, Isabel McGregor, Alistair P. Posnien, Nico BMC Genomics Methodology Article BACKGROUND: RNA-seq based on short reads generated by next generation sequencing technologies has become the main approach to study differential gene expression. Until now, the main applications of this technique have been to study the variation of gene expression in a whole organism, tissue or cell type under different conditions or at different developmental stages. However, RNA-seq also has a great potential to be used in evolutionary studies to investigate gene expression divergence in closely related species. RESULTS: We show that the published genomes and annotations of the three closely related Drosophila species D. melanogaster, D. simulans and D. mauritiana have limitations for inter-specific gene expression studies. This is due to missing gene models in at least one of the genome annotations, unclear orthology assignments and significant gene length differences in the different species. A comprehensive evaluation of four statistical frameworks (DESeq2, DESeq2 with length correction, RPKM-limma and RPKM-voom-limma) shows that none of these methods sufficiently accounts for inter-specific gene length differences, which inevitably results in false positive candidate genes. We propose that published reference genomes should be re-annotated before using them as references for RNA-seq experiments to include as many genes as possible and to account for a potential length bias. We present a straight-forward reciprocal re-annotation pipeline that allows to reliably compare the expression for nearly all genes annotated in D. melanogaster. CONCLUSIONS: We conclude that our reciprocal re-annotation of previously published genomes facilitates the analysis of significantly more genes in an inter-specific differential gene expression study. We propose that the established pipeline can easily be applied to re-annotate other genomes of closely related animals and plants to improve comparative expression analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2646-x) contains supplementary material, which is available to authorized users. BioMed Central 2016-05-24 /pmc/articles/PMC4877740/ /pubmed/27220689 http://dx.doi.org/10.1186/s12864-016-2646-x Text en © Torres-Oliva et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Torres-Oliva, Montserrat
Almudi, Isabel
McGregor, Alistair P.
Posnien, Nico
A robust (re-)annotation approach to generate unbiased mapping references for RNA-seq-based analyses of differential expression across closely related species
title A robust (re-)annotation approach to generate unbiased mapping references for RNA-seq-based analyses of differential expression across closely related species
title_full A robust (re-)annotation approach to generate unbiased mapping references for RNA-seq-based analyses of differential expression across closely related species
title_fullStr A robust (re-)annotation approach to generate unbiased mapping references for RNA-seq-based analyses of differential expression across closely related species
title_full_unstemmed A robust (re-)annotation approach to generate unbiased mapping references for RNA-seq-based analyses of differential expression across closely related species
title_short A robust (re-)annotation approach to generate unbiased mapping references for RNA-seq-based analyses of differential expression across closely related species
title_sort robust (re-)annotation approach to generate unbiased mapping references for rna-seq-based analyses of differential expression across closely related species
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877740/
https://www.ncbi.nlm.nih.gov/pubmed/27220689
http://dx.doi.org/10.1186/s12864-016-2646-x
work_keys_str_mv AT torresolivamontserrat arobustreannotationapproachtogenerateunbiasedmappingreferencesforrnaseqbasedanalysesofdifferentialexpressionacrosscloselyrelatedspecies
AT almudiisabel arobustreannotationapproachtogenerateunbiasedmappingreferencesforrnaseqbasedanalysesofdifferentialexpressionacrosscloselyrelatedspecies
AT mcgregoralistairp arobustreannotationapproachtogenerateunbiasedmappingreferencesforrnaseqbasedanalysesofdifferentialexpressionacrosscloselyrelatedspecies
AT posniennico arobustreannotationapproachtogenerateunbiasedmappingreferencesforrnaseqbasedanalysesofdifferentialexpressionacrosscloselyrelatedspecies
AT torresolivamontserrat robustreannotationapproachtogenerateunbiasedmappingreferencesforrnaseqbasedanalysesofdifferentialexpressionacrosscloselyrelatedspecies
AT almudiisabel robustreannotationapproachtogenerateunbiasedmappingreferencesforrnaseqbasedanalysesofdifferentialexpressionacrosscloselyrelatedspecies
AT mcgregoralistairp robustreannotationapproachtogenerateunbiasedmappingreferencesforrnaseqbasedanalysesofdifferentialexpressionacrosscloselyrelatedspecies
AT posniennico robustreannotationapproachtogenerateunbiasedmappingreferencesforrnaseqbasedanalysesofdifferentialexpressionacrosscloselyrelatedspecies