Cargando…

Transcriptome diversity is a systematic source of variation in RNA-sequencing data

RNA sequencing has been widely used as an essential tool to probe gene expression. While standard practices have been established to analyze RNA-seq data, it is still challenging to interpret and remove artifactual signals. Several biological and technical factors such as sex, age, batches, and sequ...

Descripción completa

Detalles Bibliográficos
Autores principales: García-Nieto, Pablo E., Wang, Ban, Fraser, Hunter B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982896/
https://www.ncbi.nlm.nih.gov/pubmed/35324895
http://dx.doi.org/10.1371/journal.pcbi.1009939
_version_ 1784681881870008320
author García-Nieto, Pablo E.
Wang, Ban
Fraser, Hunter B.
author_facet García-Nieto, Pablo E.
Wang, Ban
Fraser, Hunter B.
author_sort García-Nieto, Pablo E.
collection PubMed
description RNA sequencing has been widely used as an essential tool to probe gene expression. While standard practices have been established to analyze RNA-seq data, it is still challenging to interpret and remove artifactual signals. Several biological and technical factors such as sex, age, batches, and sequencing technology have been found to bias these estimates. Probabilistic estimation of expression residuals (PEER), which infers broad variance components in gene expression measurements, has been used to account for some systematic effects, but it has remained challenging to interpret these PEER factors. Here we show that transcriptome diversity–a simple metric based on Shannon entropy–explains a large portion of variability in gene expression and is the strongest known factor encoded in PEER factors. We then show that transcriptome diversity has significant associations with multiple technical and biological variables across diverse organisms and datasets. In sum, transcriptome diversity provides a simple explanation for a major source of variation in both gene expression estimates and PEER covariates.
format Online
Article
Text
id pubmed-8982896
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-89828962022-04-06 Transcriptome diversity is a systematic source of variation in RNA-sequencing data García-Nieto, Pablo E. Wang, Ban Fraser, Hunter B. PLoS Comput Biol Research Article RNA sequencing has been widely used as an essential tool to probe gene expression. While standard practices have been established to analyze RNA-seq data, it is still challenging to interpret and remove artifactual signals. Several biological and technical factors such as sex, age, batches, and sequencing technology have been found to bias these estimates. Probabilistic estimation of expression residuals (PEER), which infers broad variance components in gene expression measurements, has been used to account for some systematic effects, but it has remained challenging to interpret these PEER factors. Here we show that transcriptome diversity–a simple metric based on Shannon entropy–explains a large portion of variability in gene expression and is the strongest known factor encoded in PEER factors. We then show that transcriptome diversity has significant associations with multiple technical and biological variables across diverse organisms and datasets. In sum, transcriptome diversity provides a simple explanation for a major source of variation in both gene expression estimates and PEER covariates. Public Library of Science 2022-03-24 /pmc/articles/PMC8982896/ /pubmed/35324895 http://dx.doi.org/10.1371/journal.pcbi.1009939 Text en © 2022 García-Nieto et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
García-Nieto, Pablo E.
Wang, Ban
Fraser, Hunter B.
Transcriptome diversity is a systematic source of variation in RNA-sequencing data
title Transcriptome diversity is a systematic source of variation in RNA-sequencing data
title_full Transcriptome diversity is a systematic source of variation in RNA-sequencing data
title_fullStr Transcriptome diversity is a systematic source of variation in RNA-sequencing data
title_full_unstemmed Transcriptome diversity is a systematic source of variation in RNA-sequencing data
title_short Transcriptome diversity is a systematic source of variation in RNA-sequencing data
title_sort transcriptome diversity is a systematic source of variation in rna-sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982896/
https://www.ncbi.nlm.nih.gov/pubmed/35324895
http://dx.doi.org/10.1371/journal.pcbi.1009939
work_keys_str_mv AT garcianietopabloe transcriptomediversityisasystematicsourceofvariationinrnasequencingdata
AT wangban transcriptomediversityisasystematicsourceofvariationinrnasequencingdata
AT fraserhunterb transcriptomediversityisasystematicsourceofvariationinrnasequencingdata