Cargando…

Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data

BACKGROUND: A major goal of molecular biology is determining the mechanisms that control the transcription of genes. Motif Enrichment Analysis (MEA) seeks to determine which DNA-binding transcription factors control the transcription of a set of genes by detecting enrichment of known binding motifs...

Descripción completa

Detalles Bibliográficos
Autores principales: McLeay, Robert C, Bailey, Timothy L
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2868005/
https://www.ncbi.nlm.nih.gov/pubmed/20356413
http://dx.doi.org/10.1186/1471-2105-11-165
_version_ 1782181024778682368
author McLeay, Robert C
Bailey, Timothy L
author_facet McLeay, Robert C
Bailey, Timothy L
author_sort McLeay, Robert C
collection PubMed
description BACKGROUND: A major goal of molecular biology is determining the mechanisms that control the transcription of genes. Motif Enrichment Analysis (MEA) seeks to determine which DNA-binding transcription factors control the transcription of a set of genes by detecting enrichment of known binding motifs in the genes' regulatory regions. Typically, the biologist specifies a set of genes believed to be co-regulated and a library of known DNA-binding models for transcription factors, and MEA determines which (if any) of the factors may be direct regulators of the genes. Since the number of factors with known DNA-binding models is rapidly increasing as a result of high-throughput technologies, MEA is becoming increasingly useful. In this paper, we explore ways to make MEA applicable in more settings, and evaluate the efficacy of a number of MEA approaches. RESULTS: We first define a mathematical framework for Motif Enrichment Analysis that relaxes the requirement that the biologist input a selected set of genes. Instead, the input consists of all regulatory regions, each labeled with the level of a biological signal. We then define and implement a number of motif enrichment analysis methods. Some of these methods require a user-specified signal threshold, some identify an optimum threshold in a data-driven way and two of our methods are threshold-free. We evaluate these methods, along with two existing methods (Clover and PASTAA), using yeast ChIP-chip data. Our novel threshold-free method based on linear regression performs best in our evaluation, followed by the data-driven PASTAA algorithm. The Clover algorithm performs as well as PASTAA if the user-specified threshold is chosen optimally. Data-driven methods based on three statistical tests–Fisher Exact Test, rank-sum test, and multi-hypergeometric test—perform poorly, even when the threshold is chosen optimally. These methods (and Clover) perform even worse when unrestricted data-driven threshold determination is used. CONCLUSIONS: Our novel, threshold-free linear regression method works well on ChIP-chip data. Methods using data-driven threshold determination can perform poorly unless the range of thresholds is limited a priori. The limits implemented in PASTAA, however, appear to be well-chosen. Our novel algorithms—AME (Analysis of Motif Enrichment)—are available at http://bioinformatics.org.au/ame/.
format Text
id pubmed-2868005
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28680052010-05-12 Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data McLeay, Robert C Bailey, Timothy L BMC Bioinformatics Research article BACKGROUND: A major goal of molecular biology is determining the mechanisms that control the transcription of genes. Motif Enrichment Analysis (MEA) seeks to determine which DNA-binding transcription factors control the transcription of a set of genes by detecting enrichment of known binding motifs in the genes' regulatory regions. Typically, the biologist specifies a set of genes believed to be co-regulated and a library of known DNA-binding models for transcription factors, and MEA determines which (if any) of the factors may be direct regulators of the genes. Since the number of factors with known DNA-binding models is rapidly increasing as a result of high-throughput technologies, MEA is becoming increasingly useful. In this paper, we explore ways to make MEA applicable in more settings, and evaluate the efficacy of a number of MEA approaches. RESULTS: We first define a mathematical framework for Motif Enrichment Analysis that relaxes the requirement that the biologist input a selected set of genes. Instead, the input consists of all regulatory regions, each labeled with the level of a biological signal. We then define and implement a number of motif enrichment analysis methods. Some of these methods require a user-specified signal threshold, some identify an optimum threshold in a data-driven way and two of our methods are threshold-free. We evaluate these methods, along with two existing methods (Clover and PASTAA), using yeast ChIP-chip data. Our novel threshold-free method based on linear regression performs best in our evaluation, followed by the data-driven PASTAA algorithm. The Clover algorithm performs as well as PASTAA if the user-specified threshold is chosen optimally. Data-driven methods based on three statistical tests–Fisher Exact Test, rank-sum test, and multi-hypergeometric test—perform poorly, even when the threshold is chosen optimally. These methods (and Clover) perform even worse when unrestricted data-driven threshold determination is used. CONCLUSIONS: Our novel, threshold-free linear regression method works well on ChIP-chip data. Methods using data-driven threshold determination can perform poorly unless the range of thresholds is limited a priori. The limits implemented in PASTAA, however, appear to be well-chosen. Our novel algorithms—AME (Analysis of Motif Enrichment)—are available at http://bioinformatics.org.au/ame/. BioMed Central 2010-04-01 /pmc/articles/PMC2868005/ /pubmed/20356413 http://dx.doi.org/10.1186/1471-2105-11-165 Text en Copyright ©2010 McLeay and Bailey; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
McLeay, Robert C
Bailey, Timothy L
Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data
title Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data
title_full Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data
title_fullStr Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data
title_full_unstemmed Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data
title_short Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data
title_sort motif enrichment analysis: a unified framework and an evaluation on chip data
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2868005/
https://www.ncbi.nlm.nih.gov/pubmed/20356413
http://dx.doi.org/10.1186/1471-2105-11-165
work_keys_str_mv AT mcleayrobertc motifenrichmentanalysisaunifiedframeworkandanevaluationonchipdata
AT baileytimothyl motifenrichmentanalysisaunifiedframeworkandanevaluationonchipdata