Cargando…

Occupancy Classification of Position Weight Matrix-Inferred Transcription Factor Binding Sites

BACKGROUND: Computational prediction of Transcription Factor Binding Sites (TFBS) from sequence data alone is difficult and error-prone. Machine learning techniques utilizing additional environmental information about a predicted binding site (such as distances from the site to particular chromatin...

Descripción completa

Detalles Bibliográficos
Autores principales: Wright, Hollis, Cohen, Aaron, Sönmez, Kemal, Yochum, Gregory, McWeeney, Shannon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3208542/
https://www.ncbi.nlm.nih.gov/pubmed/22073148
http://dx.doi.org/10.1371/journal.pone.0026160
_version_ 1782215627637784576
author Wright, Hollis
Cohen, Aaron
Sönmez, Kemal
Yochum, Gregory
McWeeney, Shannon
author_facet Wright, Hollis
Cohen, Aaron
Sönmez, Kemal
Yochum, Gregory
McWeeney, Shannon
author_sort Wright, Hollis
collection PubMed
description BACKGROUND: Computational prediction of Transcription Factor Binding Sites (TFBS) from sequence data alone is difficult and error-prone. Machine learning techniques utilizing additional environmental information about a predicted binding site (such as distances from the site to particular chromatin features) to determine its occupancy/functionality class show promise as methods to achieve more accurate prediction of true TFBS in silico. We evaluate the Bayesian Network (BN) and Support Vector Machine (SVM) machine learning techniques on four distinct TFBS data sets and analyze their performance. We describe the features that are most useful for classification and contrast and compare these feature sets between the factors. RESULTS: Our results demonstrate good performance of classifiers both on TFBS for transcription factors used for initial training and for TFBS for other factors in cross-classification experiments. We find that distances to chromatin modifications (specifically, histone modification islands) as well as distances between such modifications to be effective predictors of TFBS occupancy, though the impact of individual predictors is largely TF specific. In our experiments, Bayesian network classifiers outperform SVM classifiers. CONCLUSIONS: Our results demonstrate good performance of machine learning techniques on the problem of occupancy classification, and demonstrate that effective classification can be achieved using distances to chromatin features. We additionally demonstrate that cross-classification of TFBS is possible, suggesting the possibility of constructing a generalizable occupancy classifier capable of handling TFBS for many different transcription factors.
format Online
Article
Text
id pubmed-3208542
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32085422011-11-09 Occupancy Classification of Position Weight Matrix-Inferred Transcription Factor Binding Sites Wright, Hollis Cohen, Aaron Sönmez, Kemal Yochum, Gregory McWeeney, Shannon PLoS One Research Article BACKGROUND: Computational prediction of Transcription Factor Binding Sites (TFBS) from sequence data alone is difficult and error-prone. Machine learning techniques utilizing additional environmental information about a predicted binding site (such as distances from the site to particular chromatin features) to determine its occupancy/functionality class show promise as methods to achieve more accurate prediction of true TFBS in silico. We evaluate the Bayesian Network (BN) and Support Vector Machine (SVM) machine learning techniques on four distinct TFBS data sets and analyze their performance. We describe the features that are most useful for classification and contrast and compare these feature sets between the factors. RESULTS: Our results demonstrate good performance of classifiers both on TFBS for transcription factors used for initial training and for TFBS for other factors in cross-classification experiments. We find that distances to chromatin modifications (specifically, histone modification islands) as well as distances between such modifications to be effective predictors of TFBS occupancy, though the impact of individual predictors is largely TF specific. In our experiments, Bayesian network classifiers outperform SVM classifiers. CONCLUSIONS: Our results demonstrate good performance of machine learning techniques on the problem of occupancy classification, and demonstrate that effective classification can be achieved using distances to chromatin features. We additionally demonstrate that cross-classification of TFBS is possible, suggesting the possibility of constructing a generalizable occupancy classifier capable of handling TFBS for many different transcription factors. Public Library of Science 2011-11-04 /pmc/articles/PMC3208542/ /pubmed/22073148 http://dx.doi.org/10.1371/journal.pone.0026160 Text en Wright et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Wright, Hollis
Cohen, Aaron
Sönmez, Kemal
Yochum, Gregory
McWeeney, Shannon
Occupancy Classification of Position Weight Matrix-Inferred Transcription Factor Binding Sites
title Occupancy Classification of Position Weight Matrix-Inferred Transcription Factor Binding Sites
title_full Occupancy Classification of Position Weight Matrix-Inferred Transcription Factor Binding Sites
title_fullStr Occupancy Classification of Position Weight Matrix-Inferred Transcription Factor Binding Sites
title_full_unstemmed Occupancy Classification of Position Weight Matrix-Inferred Transcription Factor Binding Sites
title_short Occupancy Classification of Position Weight Matrix-Inferred Transcription Factor Binding Sites
title_sort occupancy classification of position weight matrix-inferred transcription factor binding sites
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3208542/
https://www.ncbi.nlm.nih.gov/pubmed/22073148
http://dx.doi.org/10.1371/journal.pone.0026160
work_keys_str_mv AT wrighthollis occupancyclassificationofpositionweightmatrixinferredtranscriptionfactorbindingsites
AT cohenaaron occupancyclassificationofpositionweightmatrixinferredtranscriptionfactorbindingsites
AT sonmezkemal occupancyclassificationofpositionweightmatrixinferredtranscriptionfactorbindingsites
AT yochumgregory occupancyclassificationofpositionweightmatrixinferredtranscriptionfactorbindingsites
AT mcweeneyshannon occupancyclassificationofpositionweightmatrixinferredtranscriptionfactorbindingsites