Cargando…

Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments

BACKGROUND: High-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key for drawing meaningful and accurate conclusions from the massive and com...

Descripción completa

Detalles Bibliográficos
Autores principales: Bullard, James H, Purdom, Elizabeth, Hansen, Kasper D, Dudoit, Sandrine
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2838869/
https://www.ncbi.nlm.nih.gov/pubmed/20167110
http://dx.doi.org/10.1186/1471-2105-11-94
_version_ 1782178904862097408
author Bullard, James H
Purdom, Elizabeth
Hansen, Kasper D
Dudoit, Sandrine
author_facet Bullard, James H
Purdom, Elizabeth
Hansen, Kasper D
Dudoit, Sandrine
author_sort Bullard, James H
collection PubMed
description BACKGROUND: High-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key for drawing meaningful and accurate conclusions from the massive and complex datasets generated by the sequencers. We provide a detailed evaluation of statistical methods for normalization and differential expression (DE) analysis of Illumina transcriptome sequencing (mRNA-Seq) data. RESULTS: We compare statistical methods for detecting genes that are significantly DE between two types of biological samples and find that there are substantial differences in how the test statistics handle low-count genes. We evaluate how DE results are affected by features of the sequencing platform, such as, varying gene lengths, base-calling calibration method (with and without phi X control lane), and flow-cell/library preparation effects. We investigate the impact of the read count normalization method on DE results and show that the standard approach of scaling by total lane counts (e.g., RPKM) can bias estimates of DE. We propose more general quantile-based normalization procedures and demonstrate an improvement in DE detection. CONCLUSIONS: Our results have significant practical and methodological implications for the design and analysis of mRNA-Seq experiments. They highlight the importance of appropriate statistical methods for normalization and DE inference, to account for features of the sequencing platform that could impact the accuracy of results. They also reveal the need for further research in the development of statistical and computational methods for mRNA-Seq.
format Text
id pubmed-2838869
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28388692010-03-16 Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments Bullard, James H Purdom, Elizabeth Hansen, Kasper D Dudoit, Sandrine BMC Bioinformatics Research article BACKGROUND: High-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key for drawing meaningful and accurate conclusions from the massive and complex datasets generated by the sequencers. We provide a detailed evaluation of statistical methods for normalization and differential expression (DE) analysis of Illumina transcriptome sequencing (mRNA-Seq) data. RESULTS: We compare statistical methods for detecting genes that are significantly DE between two types of biological samples and find that there are substantial differences in how the test statistics handle low-count genes. We evaluate how DE results are affected by features of the sequencing platform, such as, varying gene lengths, base-calling calibration method (with and without phi X control lane), and flow-cell/library preparation effects. We investigate the impact of the read count normalization method on DE results and show that the standard approach of scaling by total lane counts (e.g., RPKM) can bias estimates of DE. We propose more general quantile-based normalization procedures and demonstrate an improvement in DE detection. CONCLUSIONS: Our results have significant practical and methodological implications for the design and analysis of mRNA-Seq experiments. They highlight the importance of appropriate statistical methods for normalization and DE inference, to account for features of the sequencing platform that could impact the accuracy of results. They also reveal the need for further research in the development of statistical and computational methods for mRNA-Seq. BioMed Central 2010-02-18 /pmc/articles/PMC2838869/ /pubmed/20167110 http://dx.doi.org/10.1186/1471-2105-11-94 Text en Copyright ©2010 Bullard et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Bullard, James H
Purdom, Elizabeth
Hansen, Kasper D
Dudoit, Sandrine
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
title Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
title_full Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
title_fullStr Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
title_full_unstemmed Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
title_short Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
title_sort evaluation of statistical methods for normalization and differential expression in mrna-seq experiments
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2838869/
https://www.ncbi.nlm.nih.gov/pubmed/20167110
http://dx.doi.org/10.1186/1471-2105-11-94
work_keys_str_mv AT bullardjamesh evaluationofstatisticalmethodsfornormalizationanddifferentialexpressioninmrnaseqexperiments
AT purdomelizabeth evaluationofstatisticalmethodsfornormalizationanddifferentialexpressioninmrnaseqexperiments
AT hansenkasperd evaluationofstatisticalmethodsfornormalizationanddifferentialexpressioninmrnaseqexperiments
AT dudoitsandrine evaluationofstatisticalmethodsfornormalizationanddifferentialexpressioninmrnaseqexperiments