Cargando…

A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory

A fundamental assumption, common to the vast majority of high-throughput transcriptome analyses, is that the expression of most genes is unchanged among samples and that total cellular RNA remains constant. As the number of analyzed experimental systems increases however, different independent studi...

Descripción completa

Detalles Bibliográficos
Autores principales: Athanasiadou, Rodoniki, Neymotin, Benjamin, Brandt, Nathan, Wang, Wei, Christiaen, Lionel, Gresham, David, Tranchina, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6428340/
https://www.ncbi.nlm.nih.gov/pubmed/30856174
http://dx.doi.org/10.1371/journal.pcbi.1006794
_version_ 1783405392843243520
author Athanasiadou, Rodoniki
Neymotin, Benjamin
Brandt, Nathan
Wang, Wei
Christiaen, Lionel
Gresham, David
Tranchina, Daniel
author_facet Athanasiadou, Rodoniki
Neymotin, Benjamin
Brandt, Nathan
Wang, Wei
Christiaen, Lionel
Gresham, David
Tranchina, Daniel
author_sort Athanasiadou, Rodoniki
collection PubMed
description A fundamental assumption, common to the vast majority of high-throughput transcriptome analyses, is that the expression of most genes is unchanged among samples and that total cellular RNA remains constant. As the number of analyzed experimental systems increases however, different independent studies demonstrate that this assumption is often violated. We present a calibration method using RNA spike-ins that allows for the measurement of absolute cellular abundance of RNA molecules. We apply the method to pooled RNA from cell populations of known sizes. For each transcript, we compute a nominal abundance that can be converted to absolute by dividing by a scale factor determined in separate experiments: the yield coefficient of the transcript relative to that of a reference spike-in measured with the same protocol. The method is derived by maximum likelihood theory in the context of a complete statistical model for sequencing counts contributed by cellular RNA and spike-ins. The counts are based on a sample from a fixed number of cells to which a fixed population of spike-in molecules has been added. We illustrate and evaluate the method with applications to two global expression data sets, one from the model eukaryote Saccharomyces cerevisiae, proliferating at different growth rates, and differentiating cardiopharyngeal cell lineages in the chordate Ciona robusta. We tested the method in a technical replicate dilution study, and in a k-fold validation study.
format Online
Article
Text
id pubmed-6428340
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-64283402019-04-01 A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory Athanasiadou, Rodoniki Neymotin, Benjamin Brandt, Nathan Wang, Wei Christiaen, Lionel Gresham, David Tranchina, Daniel PLoS Comput Biol Research Article A fundamental assumption, common to the vast majority of high-throughput transcriptome analyses, is that the expression of most genes is unchanged among samples and that total cellular RNA remains constant. As the number of analyzed experimental systems increases however, different independent studies demonstrate that this assumption is often violated. We present a calibration method using RNA spike-ins that allows for the measurement of absolute cellular abundance of RNA molecules. We apply the method to pooled RNA from cell populations of known sizes. For each transcript, we compute a nominal abundance that can be converted to absolute by dividing by a scale factor determined in separate experiments: the yield coefficient of the transcript relative to that of a reference spike-in measured with the same protocol. The method is derived by maximum likelihood theory in the context of a complete statistical model for sequencing counts contributed by cellular RNA and spike-ins. The counts are based on a sample from a fixed number of cells to which a fixed population of spike-in molecules has been added. We illustrate and evaluate the method with applications to two global expression data sets, one from the model eukaryote Saccharomyces cerevisiae, proliferating at different growth rates, and differentiating cardiopharyngeal cell lineages in the chordate Ciona robusta. We tested the method in a technical replicate dilution study, and in a k-fold validation study. Public Library of Science 2019-03-11 /pmc/articles/PMC6428340/ /pubmed/30856174 http://dx.doi.org/10.1371/journal.pcbi.1006794 Text en © 2019 Athanasiadou et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Athanasiadou, Rodoniki
Neymotin, Benjamin
Brandt, Nathan
Wang, Wei
Christiaen, Lionel
Gresham, David
Tranchina, Daniel
A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory
title A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory
title_full A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory
title_fullStr A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory
title_full_unstemmed A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory
title_short A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory
title_sort complete statistical model for calibration of rna-seq counts using external spike-ins and maximum likelihood theory
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6428340/
https://www.ncbi.nlm.nih.gov/pubmed/30856174
http://dx.doi.org/10.1371/journal.pcbi.1006794
work_keys_str_mv AT athanasiadourodoniki acompletestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT neymotinbenjamin acompletestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT brandtnathan acompletestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT wangwei acompletestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT christiaenlionel acompletestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT greshamdavid acompletestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT tranchinadaniel acompletestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT athanasiadourodoniki completestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT neymotinbenjamin completestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT brandtnathan completestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT wangwei completestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT christiaenlionel completestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT greshamdavid completestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory
AT tranchinadaniel completestatisticalmodelforcalibrationofrnaseqcountsusingexternalspikeinsandmaximumlikelihoodtheory