Cargando…

Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression

When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called “length bias”, will influence subsequent analyses such as Gene Ontology enr...

Descripción completa

Detalles Bibliográficos
Autores principales: Mi, Gu, Di, Yanming, Emerson, Sarah, Cumbie, Jason S., Chang, Jeff H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3462807/
https://www.ncbi.nlm.nih.gov/pubmed/23056249
http://dx.doi.org/10.1371/journal.pone.0046128
_version_ 1782245217834893312
author Mi, Gu
Di, Yanming
Emerson, Sarah
Cumbie, Jason S.
Chang, Jeff H.
author_facet Mi, Gu
Di, Yanming
Emerson, Sarah
Cumbie, Jason S.
Chang, Jeff H.
author_sort Mi, Gu
collection PubMed
description When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called “length bias”, will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
format Online
Article
Text
id pubmed-3462807
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34628072012-10-10 Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression Mi, Gu Di, Yanming Emerson, Sarah Cumbie, Jason S. Chang, Jeff H. PLoS One Research Article When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called “length bias”, will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible. Public Library of Science 2012-10-02 /pmc/articles/PMC3462807/ /pubmed/23056249 http://dx.doi.org/10.1371/journal.pone.0046128 Text en © 2012 Mi et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Mi, Gu
Di, Yanming
Emerson, Sarah
Cumbie, Jason S.
Chang, Jeff H.
Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression
title Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression
title_full Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression
title_fullStr Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression
title_full_unstemmed Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression
title_short Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression
title_sort length bias correction in gene ontology enrichment analysis using logistic regression
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3462807/
https://www.ncbi.nlm.nih.gov/pubmed/23056249
http://dx.doi.org/10.1371/journal.pone.0046128
work_keys_str_mv AT migu lengthbiascorrectioningeneontologyenrichmentanalysisusinglogisticregression
AT diyanming lengthbiascorrectioningeneontologyenrichmentanalysisusinglogisticregression
AT emersonsarah lengthbiascorrectioningeneontologyenrichmentanalysisusinglogisticregression
AT cumbiejasons lengthbiascorrectioningeneontologyenrichmentanalysisusinglogisticregression
AT changjeffh lengthbiascorrectioningeneontologyenrichmentanalysisusinglogisticregression