Cargando…

Automated identification of reference genes based on RNA-seq data

BACKGROUND: Gene expression analyses demand appropriate reference genes (RGs) for normalization, in order to obtain reliable assessments. Ideally, RG expression levels should remain constant in all cells, tissues or experimental conditions under study. Housekeeping genes traditionally fulfilled this...

Descripción completa

Detalles Bibliográficos
Autores principales: Carmona, Rosario, Arroyo, Macarena, Jiménez-Quesada, María José, Seoane, Pedro, Zafra, Adoración, Larrosa, Rafael, Alché, Juan de Dios, Claros, M. Gonzalo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5568602/
https://www.ncbi.nlm.nih.gov/pubmed/28830520
http://dx.doi.org/10.1186/s12938-017-0356-5
_version_ 1783258872907038720
author Carmona, Rosario
Arroyo, Macarena
Jiménez-Quesada, María José
Seoane, Pedro
Zafra, Adoración
Larrosa, Rafael
Alché, Juan de Dios
Claros, M. Gonzalo
author_facet Carmona, Rosario
Arroyo, Macarena
Jiménez-Quesada, María José
Seoane, Pedro
Zafra, Adoración
Larrosa, Rafael
Alché, Juan de Dios
Claros, M. Gonzalo
author_sort Carmona, Rosario
collection PubMed
description BACKGROUND: Gene expression analyses demand appropriate reference genes (RGs) for normalization, in order to obtain reliable assessments. Ideally, RG expression levels should remain constant in all cells, tissues or experimental conditions under study. Housekeeping genes traditionally fulfilled this requirement, but they have been reported to be less invariant than expected; therefore, RGs should be tested and validated for every particular situation. Microarray data have been used to propose new RGs, but only a limited set of model species and conditions are available; on the contrary, RNA-seq experiments are more and more frequent and constitute a new source of candidate RGs. RESULTS: An automated workflow based on mapped NGS reads has been constructed to obtain highly and invariantly expressed RGs based on a normalized expression in reads per mapped million and the coefficient of variation. This workflow has been tested with Roche/454 reads from reproductive tissues of olive tree (Olea europaea L.), as well as with Illumina paired-end reads from two different accessions of Arabidopsis thaliana and three different human cancers (prostate, small-cell cancer lung and lung adenocarcinoma). Candidate RGs have been proposed for each species and many of them have been previously reported as RGs in literature. Experimental validation of significant RGs in olive tree is provided to support the algorithm. CONCLUSION: Regardless sequencing technology, number of replicates, and library sizes, when RNA-seq experiments are designed and performed, the same datasets can be analyzed with our workflow to extract suitable RGs for subsequent PCR validation. Moreover, different subset of experimental conditions can provide different suitable RGs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12938-017-0356-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5568602
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55686022017-08-29 Automated identification of reference genes based on RNA-seq data Carmona, Rosario Arroyo, Macarena Jiménez-Quesada, María José Seoane, Pedro Zafra, Adoración Larrosa, Rafael Alché, Juan de Dios Claros, M. Gonzalo Biomed Eng Online Research BACKGROUND: Gene expression analyses demand appropriate reference genes (RGs) for normalization, in order to obtain reliable assessments. Ideally, RG expression levels should remain constant in all cells, tissues or experimental conditions under study. Housekeeping genes traditionally fulfilled this requirement, but they have been reported to be less invariant than expected; therefore, RGs should be tested and validated for every particular situation. Microarray data have been used to propose new RGs, but only a limited set of model species and conditions are available; on the contrary, RNA-seq experiments are more and more frequent and constitute a new source of candidate RGs. RESULTS: An automated workflow based on mapped NGS reads has been constructed to obtain highly and invariantly expressed RGs based on a normalized expression in reads per mapped million and the coefficient of variation. This workflow has been tested with Roche/454 reads from reproductive tissues of olive tree (Olea europaea L.), as well as with Illumina paired-end reads from two different accessions of Arabidopsis thaliana and three different human cancers (prostate, small-cell cancer lung and lung adenocarcinoma). Candidate RGs have been proposed for each species and many of them have been previously reported as RGs in literature. Experimental validation of significant RGs in olive tree is provided to support the algorithm. CONCLUSION: Regardless sequencing technology, number of replicates, and library sizes, when RNA-seq experiments are designed and performed, the same datasets can be analyzed with our workflow to extract suitable RGs for subsequent PCR validation. Moreover, different subset of experimental conditions can provide different suitable RGs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12938-017-0356-5) contains supplementary material, which is available to authorized users. BioMed Central 2017-08-18 /pmc/articles/PMC5568602/ /pubmed/28830520 http://dx.doi.org/10.1186/s12938-017-0356-5 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Carmona, Rosario
Arroyo, Macarena
Jiménez-Quesada, María José
Seoane, Pedro
Zafra, Adoración
Larrosa, Rafael
Alché, Juan de Dios
Claros, M. Gonzalo
Automated identification of reference genes based on RNA-seq data
title Automated identification of reference genes based on RNA-seq data
title_full Automated identification of reference genes based on RNA-seq data
title_fullStr Automated identification of reference genes based on RNA-seq data
title_full_unstemmed Automated identification of reference genes based on RNA-seq data
title_short Automated identification of reference genes based on RNA-seq data
title_sort automated identification of reference genes based on rna-seq data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5568602/
https://www.ncbi.nlm.nih.gov/pubmed/28830520
http://dx.doi.org/10.1186/s12938-017-0356-5
work_keys_str_mv AT carmonarosario automatedidentificationofreferencegenesbasedonrnaseqdata
AT arroyomacarena automatedidentificationofreferencegenesbasedonrnaseqdata
AT jimenezquesadamariajose automatedidentificationofreferencegenesbasedonrnaseqdata
AT seoanepedro automatedidentificationofreferencegenesbasedonrnaseqdata
AT zafraadoracion automatedidentificationofreferencegenesbasedonrnaseqdata
AT larrosarafael automatedidentificationofreferencegenesbasedonrnaseqdata
AT alchejuandedios automatedidentificationofreferencegenesbasedonrnaseqdata
AT clarosmgonzalo automatedidentificationofreferencegenesbasedonrnaseqdata