Cargando…

Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes

BACKGROUND: High-throughput RNA sequencing studies are becoming increasingly popular and differential expression studies represent an important downstream analysis that often follow de novo transcriptome assembly. If a lot of attention has been given to bioinformatics tools for differential gene exp...

Descripción completa

Detalles Bibliográficos
Autores principales: González, Emmanuel, Joly, Simon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4222115/
https://www.ncbi.nlm.nih.gov/pubmed/24298906
http://dx.doi.org/10.1186/1756-0500-6-503
_version_ 1782342983806353408
author González, Emmanuel
Joly, Simon
author_facet González, Emmanuel
Joly, Simon
author_sort González, Emmanuel
collection PubMed
description BACKGROUND: High-throughput RNA sequencing studies are becoming increasingly popular and differential expression studies represent an important downstream analysis that often follow de novo transcriptome assembly. If a lot of attention has been given to bioinformatics tools for differential gene expression, little has yet been given to the impact of the sequence data itself used in pipelines. RESULTS: We tested how using different types of reads from the ones used to assemble a de novo transcriptome (both differing in length and pairing attributes) could potentially affect differential expression (DE) results. To investigate this, we created artificial datasets out of long paired-end RNA-seq datasets initially used to build the assembly. All datasets were compared via DE analyses and because all samples come from the same sequencing run, DE of genes or isoforms can be interpreted as false positives resulting from sequence attributes. If the false positive rate for differential gene expression does not seem to be strongly affected by sequencing strategy (max. of 3.5%), it could reach 12.2% or 28.1% for differential isoform expression depending of the pipeline used. The effect of paired-end vs. single-end strategy was found to have a much greater impact in terms of false positives than sequence length. CONCLUSION: In light of false positive rate results, we recommend using paired-end over single-end sequences in differential expression studies, even if the impact is less serious for differential gene expression.
format Online
Article
Text
id pubmed-4222115
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42221152014-11-07 Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes González, Emmanuel Joly, Simon BMC Res Notes Research Article BACKGROUND: High-throughput RNA sequencing studies are becoming increasingly popular and differential expression studies represent an important downstream analysis that often follow de novo transcriptome assembly. If a lot of attention has been given to bioinformatics tools for differential gene expression, little has yet been given to the impact of the sequence data itself used in pipelines. RESULTS: We tested how using different types of reads from the ones used to assemble a de novo transcriptome (both differing in length and pairing attributes) could potentially affect differential expression (DE) results. To investigate this, we created artificial datasets out of long paired-end RNA-seq datasets initially used to build the assembly. All datasets were compared via DE analyses and because all samples come from the same sequencing run, DE of genes or isoforms can be interpreted as false positives resulting from sequence attributes. If the false positive rate for differential gene expression does not seem to be strongly affected by sequencing strategy (max. of 3.5%), it could reach 12.2% or 28.1% for differential isoform expression depending of the pipeline used. The effect of paired-end vs. single-end strategy was found to have a much greater impact in terms of false positives than sequence length. CONCLUSION: In light of false positive rate results, we recommend using paired-end over single-end sequences in differential expression studies, even if the impact is less serious for differential gene expression. BioMed Central 2013-12-03 /pmc/articles/PMC4222115/ /pubmed/24298906 http://dx.doi.org/10.1186/1756-0500-6-503 Text en Copyright © 2013 González and Joly; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
González, Emmanuel
Joly, Simon
Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes
title Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes
title_full Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes
title_fullStr Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes
title_full_unstemmed Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes
title_short Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes
title_sort impact of rna-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4222115/
https://www.ncbi.nlm.nih.gov/pubmed/24298906
http://dx.doi.org/10.1186/1756-0500-6-503
work_keys_str_mv AT gonzalezemmanuel impactofrnaseqattributesonfalsepositiveratesindifferentialexpressionanalysisofdenovoassembledtranscriptomes
AT jolysimon impactofrnaseqattributesonfalsepositiveratesindifferentialexpressionanalysisofdenovoassembledtranscriptomes