Cargando…
Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data
BACKGROUND: A major goal of molecular biology is determining the mechanisms that control the transcription of genes. Motif Enrichment Analysis (MEA) seeks to determine which DNA-binding transcription factors control the transcription of a set of genes by detecting enrichment of known binding motifs...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2868005/ https://www.ncbi.nlm.nih.gov/pubmed/20356413 http://dx.doi.org/10.1186/1471-2105-11-165 |
_version_ | 1782181024778682368 |
---|---|
author | McLeay, Robert C Bailey, Timothy L |
author_facet | McLeay, Robert C Bailey, Timothy L |
author_sort | McLeay, Robert C |
collection | PubMed |
description | BACKGROUND: A major goal of molecular biology is determining the mechanisms that control the transcription of genes. Motif Enrichment Analysis (MEA) seeks to determine which DNA-binding transcription factors control the transcription of a set of genes by detecting enrichment of known binding motifs in the genes' regulatory regions. Typically, the biologist specifies a set of genes believed to be co-regulated and a library of known DNA-binding models for transcription factors, and MEA determines which (if any) of the factors may be direct regulators of the genes. Since the number of factors with known DNA-binding models is rapidly increasing as a result of high-throughput technologies, MEA is becoming increasingly useful. In this paper, we explore ways to make MEA applicable in more settings, and evaluate the efficacy of a number of MEA approaches. RESULTS: We first define a mathematical framework for Motif Enrichment Analysis that relaxes the requirement that the biologist input a selected set of genes. Instead, the input consists of all regulatory regions, each labeled with the level of a biological signal. We then define and implement a number of motif enrichment analysis methods. Some of these methods require a user-specified signal threshold, some identify an optimum threshold in a data-driven way and two of our methods are threshold-free. We evaluate these methods, along with two existing methods (Clover and PASTAA), using yeast ChIP-chip data. Our novel threshold-free method based on linear regression performs best in our evaluation, followed by the data-driven PASTAA algorithm. The Clover algorithm performs as well as PASTAA if the user-specified threshold is chosen optimally. Data-driven methods based on three statistical tests–Fisher Exact Test, rank-sum test, and multi-hypergeometric test—perform poorly, even when the threshold is chosen optimally. These methods (and Clover) perform even worse when unrestricted data-driven threshold determination is used. CONCLUSIONS: Our novel, threshold-free linear regression method works well on ChIP-chip data. Methods using data-driven threshold determination can perform poorly unless the range of thresholds is limited a priori. The limits implemented in PASTAA, however, appear to be well-chosen. Our novel algorithms—AME (Analysis of Motif Enrichment)—are available at http://bioinformatics.org.au/ame/. |
format | Text |
id | pubmed-2868005 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-28680052010-05-12 Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data McLeay, Robert C Bailey, Timothy L BMC Bioinformatics Research article BACKGROUND: A major goal of molecular biology is determining the mechanisms that control the transcription of genes. Motif Enrichment Analysis (MEA) seeks to determine which DNA-binding transcription factors control the transcription of a set of genes by detecting enrichment of known binding motifs in the genes' regulatory regions. Typically, the biologist specifies a set of genes believed to be co-regulated and a library of known DNA-binding models for transcription factors, and MEA determines which (if any) of the factors may be direct regulators of the genes. Since the number of factors with known DNA-binding models is rapidly increasing as a result of high-throughput technologies, MEA is becoming increasingly useful. In this paper, we explore ways to make MEA applicable in more settings, and evaluate the efficacy of a number of MEA approaches. RESULTS: We first define a mathematical framework for Motif Enrichment Analysis that relaxes the requirement that the biologist input a selected set of genes. Instead, the input consists of all regulatory regions, each labeled with the level of a biological signal. We then define and implement a number of motif enrichment analysis methods. Some of these methods require a user-specified signal threshold, some identify an optimum threshold in a data-driven way and two of our methods are threshold-free. We evaluate these methods, along with two existing methods (Clover and PASTAA), using yeast ChIP-chip data. Our novel threshold-free method based on linear regression performs best in our evaluation, followed by the data-driven PASTAA algorithm. The Clover algorithm performs as well as PASTAA if the user-specified threshold is chosen optimally. Data-driven methods based on three statistical tests–Fisher Exact Test, rank-sum test, and multi-hypergeometric test—perform poorly, even when the threshold is chosen optimally. These methods (and Clover) perform even worse when unrestricted data-driven threshold determination is used. CONCLUSIONS: Our novel, threshold-free linear regression method works well on ChIP-chip data. Methods using data-driven threshold determination can perform poorly unless the range of thresholds is limited a priori. The limits implemented in PASTAA, however, appear to be well-chosen. Our novel algorithms—AME (Analysis of Motif Enrichment)—are available at http://bioinformatics.org.au/ame/. BioMed Central 2010-04-01 /pmc/articles/PMC2868005/ /pubmed/20356413 http://dx.doi.org/10.1186/1471-2105-11-165 Text en Copyright ©2010 McLeay and Bailey; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research article McLeay, Robert C Bailey, Timothy L Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data |
title | Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data |
title_full | Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data |
title_fullStr | Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data |
title_full_unstemmed | Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data |
title_short | Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data |
title_sort | motif enrichment analysis: a unified framework and an evaluation on chip data |
topic | Research article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2868005/ https://www.ncbi.nlm.nih.gov/pubmed/20356413 http://dx.doi.org/10.1186/1471-2105-11-165 |
work_keys_str_mv | AT mcleayrobertc motifenrichmentanalysisaunifiedframeworkandanevaluationonchipdata AT baileytimothyl motifenrichmentanalysisaunifiedframeworkandanevaluationonchipdata |