Cargando…
How Many Genes Are Expressed in a Transcriptome? Estimation and Results for RNA-Seq
RNA-seq experiments estimate the number of genes expressed in a transcriptome as well as their relative frequencies. However, an undetermined number of genes can remain undetected due to their low expression relative to the sample size (sequence depth). Estimation of the true number of genes express...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4479379/ https://www.ncbi.nlm.nih.gov/pubmed/26107654 http://dx.doi.org/10.1371/journal.pone.0130262 |
_version_ | 1782378000540499968 |
---|---|
author | García-Ortega, Luis Fernando Martínez, Octavio |
author_facet | García-Ortega, Luis Fernando Martínez, Octavio |
author_sort | García-Ortega, Luis Fernando |
collection | PubMed |
description | RNA-seq experiments estimate the number of genes expressed in a transcriptome as well as their relative frequencies. However, an undetermined number of genes can remain undetected due to their low expression relative to the sample size (sequence depth). Estimation of the true number of genes expressed in a transcriptome is essential in order to determine which genes are exclusively expressed in specific tissues or under particular conditions. A reliable estimate of the true number of expressed genes is also required to accurately measure transcriptome changes and to predict the sequencing depth needed to increase the proportion of detected genes. This problem is analogous to ecological sampling problems such as estimating the number of species at a given site. Here we present a non-parametric estimator for the number of undetected genes as well as for the extra sample size needed to detect a given proportion of the undetected genes. Our estimators are superior to ones already published by having smaller standard errors and biases. We applied our method to a set of 32 publicly available RNA-seq experiments, including the evaluation of 311 individually sequenced libraries. We found that in the majority of the cases more than one thousand genes are undetected, and that on average approximately 6% of the expressed genes per accession remain undetected. This figure increases to approximately 10% if individual sequencing libraries are analyzed. Our method is also applicable to metagenomic experiments. Using our method, the number of undetected genes as well as the sample size needed to detect them can be calculated, leading to more accurate and complete gene expression studies. |
format | Online Article Text |
id | pubmed-4479379 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-44793792015-06-29 How Many Genes Are Expressed in a Transcriptome? Estimation and Results for RNA-Seq García-Ortega, Luis Fernando Martínez, Octavio PLoS One Research Article RNA-seq experiments estimate the number of genes expressed in a transcriptome as well as their relative frequencies. However, an undetermined number of genes can remain undetected due to their low expression relative to the sample size (sequence depth). Estimation of the true number of genes expressed in a transcriptome is essential in order to determine which genes are exclusively expressed in specific tissues or under particular conditions. A reliable estimate of the true number of expressed genes is also required to accurately measure transcriptome changes and to predict the sequencing depth needed to increase the proportion of detected genes. This problem is analogous to ecological sampling problems such as estimating the number of species at a given site. Here we present a non-parametric estimator for the number of undetected genes as well as for the extra sample size needed to detect a given proportion of the undetected genes. Our estimators are superior to ones already published by having smaller standard errors and biases. We applied our method to a set of 32 publicly available RNA-seq experiments, including the evaluation of 311 individually sequenced libraries. We found that in the majority of the cases more than one thousand genes are undetected, and that on average approximately 6% of the expressed genes per accession remain undetected. This figure increases to approximately 10% if individual sequencing libraries are analyzed. Our method is also applicable to metagenomic experiments. Using our method, the number of undetected genes as well as the sample size needed to detect them can be calculated, leading to more accurate and complete gene expression studies. Public Library of Science 2015-06-24 /pmc/articles/PMC4479379/ /pubmed/26107654 http://dx.doi.org/10.1371/journal.pone.0130262 Text en © 2015 García-Ortega, Martínez http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article García-Ortega, Luis Fernando Martínez, Octavio How Many Genes Are Expressed in a Transcriptome? Estimation and Results for RNA-Seq |
title | How Many Genes Are Expressed in a Transcriptome? Estimation and Results for RNA-Seq |
title_full | How Many Genes Are Expressed in a Transcriptome? Estimation and Results for RNA-Seq |
title_fullStr | How Many Genes Are Expressed in a Transcriptome? Estimation and Results for RNA-Seq |
title_full_unstemmed | How Many Genes Are Expressed in a Transcriptome? Estimation and Results for RNA-Seq |
title_short | How Many Genes Are Expressed in a Transcriptome? Estimation and Results for RNA-Seq |
title_sort | how many genes are expressed in a transcriptome? estimation and results for rna-seq |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4479379/ https://www.ncbi.nlm.nih.gov/pubmed/26107654 http://dx.doi.org/10.1371/journal.pone.0130262 |
work_keys_str_mv | AT garciaortegaluisfernando howmanygenesareexpressedinatranscriptomeestimationandresultsforrnaseq AT martinezoctavio howmanygenesareexpressedinatranscriptomeestimationandresultsforrnaseq |