Cargando…

The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data

We recently introduced the Gini coefficient (GC) for assessing the expression variation of a particular gene in a dataset, as a means of selecting improved reference genes over the cohort (‘housekeeping genes’) typically used for normalisation in expression profiling studies. Those genes (transcript...

Descripción completa

Detalles Bibliográficos
Autores principales: Wright Muelas, Marina, Mughal, Farah, O’Hagan, Steve, Day, Philip J., Kell, Douglas B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6884504/
https://www.ncbi.nlm.nih.gov/pubmed/31784565
http://dx.doi.org/10.1038/s41598-019-54288-7
_version_ 1783474562511405056
author Wright Muelas, Marina
Mughal, Farah
O’Hagan, Steve
Day, Philip J.
Kell, Douglas B.
author_facet Wright Muelas, Marina
Mughal, Farah
O’Hagan, Steve
Day, Philip J.
Kell, Douglas B.
author_sort Wright Muelas, Marina
collection PubMed
description We recently introduced the Gini coefficient (GC) for assessing the expression variation of a particular gene in a dataset, as a means of selecting improved reference genes over the cohort (‘housekeeping genes’) typically used for normalisation in expression profiling studies. Those genes (transcripts) that we determined to be useable as reference genes differed greatly from previous suggestions based on hypothesis-driven approaches. A limitation of this initial study is that a single (albeit large) dataset was employed for both tissues and cell lines. We here extend this analysis to encompass seven other large datasets. Although their absolute values differ a little, the Gini values and median expression levels of the various genes are well correlated with each other between the various cell line datasets, implying that our original choice of the more ubiquitously expressed low-Gini-coefficient genes was indeed sound. In tissues, the Gini values and median expression levels of genes showed a greater variation, with the GC of genes changing with the number and types of tissues in the data sets. In all data sets, regardless of whether this was derived from tissues or cell lines, we also show that the GC is a robust measure of gene expression stability. Using the GC as a measure of expression stability we illustrate its utility to find tissue- and cell line-optimised housekeeping genes without any prior bias, that again include only a small number of previously reported housekeeping genes. We also independently confirmed this experimentally using RT-qPCR with 40 candidate GC genes in a panel of 10 cell lines. These were termed the Gini Genes. In many cases, the variation in the expression levels of classical reference genes is really quite huge (e.g. 44 fold for GAPDH in one data set), suggesting that the cure (of using them as normalising genes) may in some cases be worse than the disease (of not doing so). We recommend the present data-driven approach for the selection of reference genes by using the easy-to-calculate and robust GC.
format Online
Article
Text
id pubmed-6884504
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-68845042019-12-06 The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data Wright Muelas, Marina Mughal, Farah O’Hagan, Steve Day, Philip J. Kell, Douglas B. Sci Rep Article We recently introduced the Gini coefficient (GC) for assessing the expression variation of a particular gene in a dataset, as a means of selecting improved reference genes over the cohort (‘housekeeping genes’) typically used for normalisation in expression profiling studies. Those genes (transcripts) that we determined to be useable as reference genes differed greatly from previous suggestions based on hypothesis-driven approaches. A limitation of this initial study is that a single (albeit large) dataset was employed for both tissues and cell lines. We here extend this analysis to encompass seven other large datasets. Although their absolute values differ a little, the Gini values and median expression levels of the various genes are well correlated with each other between the various cell line datasets, implying that our original choice of the more ubiquitously expressed low-Gini-coefficient genes was indeed sound. In tissues, the Gini values and median expression levels of genes showed a greater variation, with the GC of genes changing with the number and types of tissues in the data sets. In all data sets, regardless of whether this was derived from tissues or cell lines, we also show that the GC is a robust measure of gene expression stability. Using the GC as a measure of expression stability we illustrate its utility to find tissue- and cell line-optimised housekeeping genes without any prior bias, that again include only a small number of previously reported housekeeping genes. We also independently confirmed this experimentally using RT-qPCR with 40 candidate GC genes in a panel of 10 cell lines. These were termed the Gini Genes. In many cases, the variation in the expression levels of classical reference genes is really quite huge (e.g. 44 fold for GAPDH in one data set), suggesting that the cure (of using them as normalising genes) may in some cases be worse than the disease (of not doing so). We recommend the present data-driven approach for the selection of reference genes by using the easy-to-calculate and robust GC. Nature Publishing Group UK 2019-11-29 /pmc/articles/PMC6884504/ /pubmed/31784565 http://dx.doi.org/10.1038/s41598-019-54288-7 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Wright Muelas, Marina
Mughal, Farah
O’Hagan, Steve
Day, Philip J.
Kell, Douglas B.
The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data
title The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data
title_full The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data
title_fullStr The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data
title_full_unstemmed The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data
title_short The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data
title_sort role and robustness of the gini coefficient as an unbiased tool for the selection of gini genes for normalising expression profiling data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6884504/
https://www.ncbi.nlm.nih.gov/pubmed/31784565
http://dx.doi.org/10.1038/s41598-019-54288-7
work_keys_str_mv AT wrightmuelasmarina theroleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata
AT mughalfarah theroleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata
AT ohagansteve theroleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata
AT dayphilipj theroleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata
AT kelldouglasb theroleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata
AT wrightmuelasmarina roleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata
AT mughalfarah roleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata
AT ohagansteve roleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata
AT dayphilipj roleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata
AT kelldouglasb roleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata