Cargando…
The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data
We recently introduced the Gini coefficient (GC) for assessing the expression variation of a particular gene in a dataset, as a means of selecting improved reference genes over the cohort (‘housekeeping genes’) typically used for normalisation in expression profiling studies. Those genes (transcript...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6884504/ https://www.ncbi.nlm.nih.gov/pubmed/31784565 http://dx.doi.org/10.1038/s41598-019-54288-7 |
_version_ | 1783474562511405056 |
---|---|
author | Wright Muelas, Marina Mughal, Farah O’Hagan, Steve Day, Philip J. Kell, Douglas B. |
author_facet | Wright Muelas, Marina Mughal, Farah O’Hagan, Steve Day, Philip J. Kell, Douglas B. |
author_sort | Wright Muelas, Marina |
collection | PubMed |
description | We recently introduced the Gini coefficient (GC) for assessing the expression variation of a particular gene in a dataset, as a means of selecting improved reference genes over the cohort (‘housekeeping genes’) typically used for normalisation in expression profiling studies. Those genes (transcripts) that we determined to be useable as reference genes differed greatly from previous suggestions based on hypothesis-driven approaches. A limitation of this initial study is that a single (albeit large) dataset was employed for both tissues and cell lines. We here extend this analysis to encompass seven other large datasets. Although their absolute values differ a little, the Gini values and median expression levels of the various genes are well correlated with each other between the various cell line datasets, implying that our original choice of the more ubiquitously expressed low-Gini-coefficient genes was indeed sound. In tissues, the Gini values and median expression levels of genes showed a greater variation, with the GC of genes changing with the number and types of tissues in the data sets. In all data sets, regardless of whether this was derived from tissues or cell lines, we also show that the GC is a robust measure of gene expression stability. Using the GC as a measure of expression stability we illustrate its utility to find tissue- and cell line-optimised housekeeping genes without any prior bias, that again include only a small number of previously reported housekeeping genes. We also independently confirmed this experimentally using RT-qPCR with 40 candidate GC genes in a panel of 10 cell lines. These were termed the Gini Genes. In many cases, the variation in the expression levels of classical reference genes is really quite huge (e.g. 44 fold for GAPDH in one data set), suggesting that the cure (of using them as normalising genes) may in some cases be worse than the disease (of not doing so). We recommend the present data-driven approach for the selection of reference genes by using the easy-to-calculate and robust GC. |
format | Online Article Text |
id | pubmed-6884504 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-68845042019-12-06 The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data Wright Muelas, Marina Mughal, Farah O’Hagan, Steve Day, Philip J. Kell, Douglas B. Sci Rep Article We recently introduced the Gini coefficient (GC) for assessing the expression variation of a particular gene in a dataset, as a means of selecting improved reference genes over the cohort (‘housekeeping genes’) typically used for normalisation in expression profiling studies. Those genes (transcripts) that we determined to be useable as reference genes differed greatly from previous suggestions based on hypothesis-driven approaches. A limitation of this initial study is that a single (albeit large) dataset was employed for both tissues and cell lines. We here extend this analysis to encompass seven other large datasets. Although their absolute values differ a little, the Gini values and median expression levels of the various genes are well correlated with each other between the various cell line datasets, implying that our original choice of the more ubiquitously expressed low-Gini-coefficient genes was indeed sound. In tissues, the Gini values and median expression levels of genes showed a greater variation, with the GC of genes changing with the number and types of tissues in the data sets. In all data sets, regardless of whether this was derived from tissues or cell lines, we also show that the GC is a robust measure of gene expression stability. Using the GC as a measure of expression stability we illustrate its utility to find tissue- and cell line-optimised housekeeping genes without any prior bias, that again include only a small number of previously reported housekeeping genes. We also independently confirmed this experimentally using RT-qPCR with 40 candidate GC genes in a panel of 10 cell lines. These were termed the Gini Genes. In many cases, the variation in the expression levels of classical reference genes is really quite huge (e.g. 44 fold for GAPDH in one data set), suggesting that the cure (of using them as normalising genes) may in some cases be worse than the disease (of not doing so). We recommend the present data-driven approach for the selection of reference genes by using the easy-to-calculate and robust GC. Nature Publishing Group UK 2019-11-29 /pmc/articles/PMC6884504/ /pubmed/31784565 http://dx.doi.org/10.1038/s41598-019-54288-7 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Wright Muelas, Marina Mughal, Farah O’Hagan, Steve Day, Philip J. Kell, Douglas B. The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data |
title | The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data |
title_full | The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data |
title_fullStr | The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data |
title_full_unstemmed | The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data |
title_short | The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data |
title_sort | role and robustness of the gini coefficient as an unbiased tool for the selection of gini genes for normalising expression profiling data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6884504/ https://www.ncbi.nlm.nih.gov/pubmed/31784565 http://dx.doi.org/10.1038/s41598-019-54288-7 |
work_keys_str_mv | AT wrightmuelasmarina theroleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata AT mughalfarah theroleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata AT ohagansteve theroleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata AT dayphilipj theroleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata AT kelldouglasb theroleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata AT wrightmuelasmarina roleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata AT mughalfarah roleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata AT ohagansteve roleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata AT dayphilipj roleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata AT kelldouglasb roleandrobustnessoftheginicoefficientasanunbiasedtoolfortheselectionofginigenesfornormalisingexpressionprofilingdata |