Cargando…

Searching for transcription factor binding sites in vector spaces

BACKGROUND: Computational approaches to transcription factor binding site identification have been actively researched in the past decade. Learning from known binding sites, new binding sites of a transcription factor in unannotated sequences can be identified. A number of search methods have been i...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Chih, Huang, Chun-Hsi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3543194/
https://www.ncbi.nlm.nih.gov/pubmed/23244338
http://dx.doi.org/10.1186/1471-2105-13-215
_version_ 1782255611645263872
author Lee, Chih
Huang, Chun-Hsi
author_facet Lee, Chih
Huang, Chun-Hsi
author_sort Lee, Chih
collection PubMed
description BACKGROUND: Computational approaches to transcription factor binding site identification have been actively researched in the past decade. Learning from known binding sites, new binding sites of a transcription factor in unannotated sequences can be identified. A number of search methods have been introduced over the years. However, one can rarely find one single method that performs the best on all the transcription factors. Instead, to identify the best method for a particular transcription factor, one usually has to compare a handful of methods. Hence, it is highly desirable for a method to perform automatic optimization for individual transcription factors. RESULTS: We proposed to search for transcription factor binding sites in vector spaces. This framework allows us to identify the best method for each individual transcription factor. We further introduced two novel methods, the negative-to-positive vector (NPV) and optimal discriminating vector (ODV) methods, to construct query vectors to search for binding sites in vector spaces. Extensive cross-validation experiments showed that the proposed methods significantly outperformed the ungapped likelihood under positional background method, a state-of-the-art method, and the widely-used position-specific scoring matrix method. We further demonstrated that motif subtypes of a TF can be readily identified in this framework and two variants called the k NPV and k ODV methods benefited significantly from motif subtype identification. Finally, independent validation on ChIP-seq data showed that the ODV and NPV methods significantly outperformed the other compared methods. CONCLUSIONS: We conclude that the proposed framework is highly flexible. It enables the two novel methods to automatically identify a TF-specific subspace to search for binding sites. Implementations are available as source code at: http://biogrid.engr.uconn.edu/tfbs_search/.
format Online
Article
Text
id pubmed-3543194
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35431942013-01-14 Searching for transcription factor binding sites in vector spaces Lee, Chih Huang, Chun-Hsi BMC Bioinformatics Methodology Article BACKGROUND: Computational approaches to transcription factor binding site identification have been actively researched in the past decade. Learning from known binding sites, new binding sites of a transcription factor in unannotated sequences can be identified. A number of search methods have been introduced over the years. However, one can rarely find one single method that performs the best on all the transcription factors. Instead, to identify the best method for a particular transcription factor, one usually has to compare a handful of methods. Hence, it is highly desirable for a method to perform automatic optimization for individual transcription factors. RESULTS: We proposed to search for transcription factor binding sites in vector spaces. This framework allows us to identify the best method for each individual transcription factor. We further introduced two novel methods, the negative-to-positive vector (NPV) and optimal discriminating vector (ODV) methods, to construct query vectors to search for binding sites in vector spaces. Extensive cross-validation experiments showed that the proposed methods significantly outperformed the ungapped likelihood under positional background method, a state-of-the-art method, and the widely-used position-specific scoring matrix method. We further demonstrated that motif subtypes of a TF can be readily identified in this framework and two variants called the k NPV and k ODV methods benefited significantly from motif subtype identification. Finally, independent validation on ChIP-seq data showed that the ODV and NPV methods significantly outperformed the other compared methods. CONCLUSIONS: We conclude that the proposed framework is highly flexible. It enables the two novel methods to automatically identify a TF-specific subspace to search for binding sites. Implementations are available as source code at: http://biogrid.engr.uconn.edu/tfbs_search/. BioMed Central 2012-08-27 /pmc/articles/PMC3543194/ /pubmed/23244338 http://dx.doi.org/10.1186/1471-2105-13-215 Text en Copyright ©2012 Lee and Huang; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Lee, Chih
Huang, Chun-Hsi
Searching for transcription factor binding sites in vector spaces
title Searching for transcription factor binding sites in vector spaces
title_full Searching for transcription factor binding sites in vector spaces
title_fullStr Searching for transcription factor binding sites in vector spaces
title_full_unstemmed Searching for transcription factor binding sites in vector spaces
title_short Searching for transcription factor binding sites in vector spaces
title_sort searching for transcription factor binding sites in vector spaces
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3543194/
https://www.ncbi.nlm.nih.gov/pubmed/23244338
http://dx.doi.org/10.1186/1471-2105-13-215
work_keys_str_mv AT leechih searchingfortranscriptionfactorbindingsitesinvectorspaces
AT huangchunhsi searchingfortranscriptionfactorbindingsitesinvectorspaces