Cargando…

Custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis

BACKGROUND: RNA sequencing allows the measuring of gene expression at a resolution unmet by expression arrays or RT-qPCR. It is however necessary to normalize sequencing data by library size, transcript size and composition, among other factors, before comparing expression levels. The use of interna...

Descripción completa

Detalles Bibliográficos
Autores principales: dos Santos, Karen Cristine Gonçalves, Desgagné-Penix, Isabel, Germain, Hugo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954607/
https://www.ncbi.nlm.nih.gov/pubmed/31924161
http://dx.doi.org/10.1186/s12864-019-6426-2
_version_ 1783486830747844608
author dos Santos, Karen Cristine Gonçalves
Desgagné-Penix, Isabel
Germain, Hugo
author_facet dos Santos, Karen Cristine Gonçalves
Desgagné-Penix, Isabel
Germain, Hugo
author_sort dos Santos, Karen Cristine Gonçalves
collection PubMed
description BACKGROUND: RNA sequencing allows the measuring of gene expression at a resolution unmet by expression arrays or RT-qPCR. It is however necessary to normalize sequencing data by library size, transcript size and composition, among other factors, before comparing expression levels. The use of internal control genes or spike-ins is advocated in the literature for scaling read counts, but the methods for choosing reference genes are mostly targeted at RT-qPCR studies and require a set of pre-selected candidate controls or pre-selected target genes. RESULTS: Here, we report an R-based pipeline to select internal control genes based solely on read counts and gene sizes. This novel method first normalizes the read counts to Transcripts per Million (TPM) and then excludes weakly expressed genes using the DAFS script to calculate the cut-off. It then selects as references the genes with lowest TPM coefficient of variation. We used this method to pick custom reference genes for the differential expression analysis of three transcriptome sets from transgenic Arabidopsis plants expressing heterologous fungal effector proteins tagged with GFP (using GFP alone as the control). The custom reference genes showed lower coefficient of variation and fold change as well as a broader range of expression levels than commonly used reference genes. When analyzed with NormFinder, both typical and custom reference genes were considered suitable internal controls, but the custom selected genes were more stably expressed. geNorm produced a similar result in which most custom selected genes ranked higher (i.e. were more stably expressed) than commonly used reference genes. CONCLUSIONS: The proposed method is innovative, rapid and simple. Since it does not depend on genome annotation, it can be used with any organism, and does not require pre-selected reference candidates or target genes that are not always available.
format Online
Article
Text
id pubmed-6954607
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69546072020-01-14 Custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis dos Santos, Karen Cristine Gonçalves Desgagné-Penix, Isabel Germain, Hugo BMC Genomics Methodology Article BACKGROUND: RNA sequencing allows the measuring of gene expression at a resolution unmet by expression arrays or RT-qPCR. It is however necessary to normalize sequencing data by library size, transcript size and composition, among other factors, before comparing expression levels. The use of internal control genes or spike-ins is advocated in the literature for scaling read counts, but the methods for choosing reference genes are mostly targeted at RT-qPCR studies and require a set of pre-selected candidate controls or pre-selected target genes. RESULTS: Here, we report an R-based pipeline to select internal control genes based solely on read counts and gene sizes. This novel method first normalizes the read counts to Transcripts per Million (TPM) and then excludes weakly expressed genes using the DAFS script to calculate the cut-off. It then selects as references the genes with lowest TPM coefficient of variation. We used this method to pick custom reference genes for the differential expression analysis of three transcriptome sets from transgenic Arabidopsis plants expressing heterologous fungal effector proteins tagged with GFP (using GFP alone as the control). The custom reference genes showed lower coefficient of variation and fold change as well as a broader range of expression levels than commonly used reference genes. When analyzed with NormFinder, both typical and custom reference genes were considered suitable internal controls, but the custom selected genes were more stably expressed. geNorm produced a similar result in which most custom selected genes ranked higher (i.e. were more stably expressed) than commonly used reference genes. CONCLUSIONS: The proposed method is innovative, rapid and simple. Since it does not depend on genome annotation, it can be used with any organism, and does not require pre-selected reference candidates or target genes that are not always available. BioMed Central 2020-01-10 /pmc/articles/PMC6954607/ /pubmed/31924161 http://dx.doi.org/10.1186/s12864-019-6426-2 Text en © The Author(s). 2020, corrected publication 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
dos Santos, Karen Cristine Gonçalves
Desgagné-Penix, Isabel
Germain, Hugo
Custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis
title Custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis
title_full Custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis
title_fullStr Custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis
title_full_unstemmed Custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis
title_short Custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis
title_sort custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954607/
https://www.ncbi.nlm.nih.gov/pubmed/31924161
http://dx.doi.org/10.1186/s12864-019-6426-2
work_keys_str_mv AT dossantoskarencristinegoncalves customselectedreferencegenesoutperformpredefinedreferencegenesintranscriptomicanalysis
AT desgagnepenixisabel customselectedreferencegenesoutperformpredefinedreferencegenesintranscriptomicanalysis
AT germainhugo customselectedreferencegenesoutperformpredefinedreferencegenesintranscriptomicanalysis