Cargando…

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the b...

Descripción completa

Detalles Bibliográficos
Autores principales: Soneson, Charlotte, Love, Michael I., Robinson, Mark D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000Research 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4712774/
https://www.ncbi.nlm.nih.gov/pubmed/26925227
http://dx.doi.org/10.12688/f1000research.7563.2
_version_ 1782410112480051200
author Soneson, Charlotte
Love, Michael I.
Robinson, Mark D.
author_facet Soneson, Charlotte
Love, Michael I.
Robinson, Mark D.
author_sort Soneson, Charlotte
collection PubMed
description High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
format Online
Article
Text
id pubmed-4712774
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher F1000Research
record_format MEDLINE/PubMed
spelling pubmed-47127742016-02-25 Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences Soneson, Charlotte Love, Michael I. Robinson, Mark D. F1000Res Method Article High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines. F1000Research 2016-02-29 /pmc/articles/PMC4712774/ /pubmed/26925227 http://dx.doi.org/10.12688/f1000research.7563.2 Text en Copyright: © 2016 Soneson C et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Method Article
Soneson, Charlotte
Love, Michael I.
Robinson, Mark D.
Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences
title Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences
title_full Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences
title_fullStr Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences
title_full_unstemmed Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences
title_short Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences
title_sort differential analyses for rna-seq: transcript-level estimates improve gene-level inferences
topic Method Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4712774/
https://www.ncbi.nlm.nih.gov/pubmed/26925227
http://dx.doi.org/10.12688/f1000research.7563.2
work_keys_str_mv AT sonesoncharlotte differentialanalysesforrnaseqtranscriptlevelestimatesimprovegenelevelinferences
AT lovemichaeli differentialanalysesforrnaseqtranscriptlevelestimatesimprovegenelevelinferences
AT robinsonmarkd differentialanalysesforrnaseqtranscriptlevelestimatesimprovegenelevelinferences