Cargando…

LASAGNA: A novel algorithm for transcription factor binding site alignment

BACKGROUND: Scientists routinely scan DNA sequences for transcription factor (TF) binding sites (TFBSs). Most of the available tools rely on position-specific scoring matrices (PSSMs) constructed from aligned binding sites. Because of the resolutions of assays used to obtain TFBSs, databases such as...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Chih, Huang, Chun-Hsi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3747862/
https://www.ncbi.nlm.nih.gov/pubmed/23522376
http://dx.doi.org/10.1186/1471-2105-14-108
_version_ 1782280989640228864
author Lee, Chih
Huang, Chun-Hsi
author_facet Lee, Chih
Huang, Chun-Hsi
author_sort Lee, Chih
collection PubMed
description BACKGROUND: Scientists routinely scan DNA sequences for transcription factor (TF) binding sites (TFBSs). Most of the available tools rely on position-specific scoring matrices (PSSMs) constructed from aligned binding sites. Because of the resolutions of assays used to obtain TFBSs, databases such as TRANSFAC, ORegAnno and PAZAR store unaligned variable-length DNA segments containing binding sites of a TF. These DNA segments need to be aligned to build a PSSM. While the TRANSFAC database provides scoring matrices for TFs, nearly 78% of the TFs in the public release do not have matrices available. As work on TFBS alignment algorithms has been limited, it is highly desirable to have an alignment algorithm tailored to TFBSs. RESULTS: We designed a novel algorithm named LASAGNA, which is aware of the lengths of input TFBSs and utilizes position dependence. Results on 189 TFs of 5 species in the TRANSFAC database showed that our method significantly outperformed ClustalW2 and MEME. We further compared a PSSM method dependent on LASAGNA to an alignment-free TFBS search method. Results on 89 TFs whose binding sites can be located in genomes showed that our method is significantly more precise at fixed recall rates. Finally, we described LASAGNA-ChIP, a more sophisticated version for ChIP (Chromatin immunoprecipitation) experiments. Under the one-per-sequence model, it showed comparable performance with MEME in discovering motifs in ChIP-seq peak sequences. CONCLUSIONS: We conclude that the LASAGNA algorithm is simple and effective in aligning variable-length binding sites. It has been integrated into a user-friendly webtool for TFBS search and visualization called LASAGNA-Search. The tool currently stores precomputed PSSM models for 189 TFs and 133 TFs built from TFBSs in the TRANSFAC Public database (release 7.0) and the ORegAnno database (08Nov10 dump), respectively. The webtool is available at http://biogrid.engr.uconn.edu/lasagna_search/.
format Online
Article
Text
id pubmed-3747862
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-37478622013-08-22 LASAGNA: A novel algorithm for transcription factor binding site alignment Lee, Chih Huang, Chun-Hsi BMC Bioinformatics Methodology Article BACKGROUND: Scientists routinely scan DNA sequences for transcription factor (TF) binding sites (TFBSs). Most of the available tools rely on position-specific scoring matrices (PSSMs) constructed from aligned binding sites. Because of the resolutions of assays used to obtain TFBSs, databases such as TRANSFAC, ORegAnno and PAZAR store unaligned variable-length DNA segments containing binding sites of a TF. These DNA segments need to be aligned to build a PSSM. While the TRANSFAC database provides scoring matrices for TFs, nearly 78% of the TFs in the public release do not have matrices available. As work on TFBS alignment algorithms has been limited, it is highly desirable to have an alignment algorithm tailored to TFBSs. RESULTS: We designed a novel algorithm named LASAGNA, which is aware of the lengths of input TFBSs and utilizes position dependence. Results on 189 TFs of 5 species in the TRANSFAC database showed that our method significantly outperformed ClustalW2 and MEME. We further compared a PSSM method dependent on LASAGNA to an alignment-free TFBS search method. Results on 89 TFs whose binding sites can be located in genomes showed that our method is significantly more precise at fixed recall rates. Finally, we described LASAGNA-ChIP, a more sophisticated version for ChIP (Chromatin immunoprecipitation) experiments. Under the one-per-sequence model, it showed comparable performance with MEME in discovering motifs in ChIP-seq peak sequences. CONCLUSIONS: We conclude that the LASAGNA algorithm is simple and effective in aligning variable-length binding sites. It has been integrated into a user-friendly webtool for TFBS search and visualization called LASAGNA-Search. The tool currently stores precomputed PSSM models for 189 TFs and 133 TFs built from TFBSs in the TRANSFAC Public database (release 7.0) and the ORegAnno database (08Nov10 dump), respectively. The webtool is available at http://biogrid.engr.uconn.edu/lasagna_search/. BioMed Central 2013-03-24 /pmc/articles/PMC3747862/ /pubmed/23522376 http://dx.doi.org/10.1186/1471-2105-14-108 Text en Copyright © 2013 Lee and Huang; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Lee, Chih
Huang, Chun-Hsi
LASAGNA: A novel algorithm for transcription factor binding site alignment
title LASAGNA: A novel algorithm for transcription factor binding site alignment
title_full LASAGNA: A novel algorithm for transcription factor binding site alignment
title_fullStr LASAGNA: A novel algorithm for transcription factor binding site alignment
title_full_unstemmed LASAGNA: A novel algorithm for transcription factor binding site alignment
title_short LASAGNA: A novel algorithm for transcription factor binding site alignment
title_sort lasagna: a novel algorithm for transcription factor binding site alignment
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3747862/
https://www.ncbi.nlm.nih.gov/pubmed/23522376
http://dx.doi.org/10.1186/1471-2105-14-108
work_keys_str_mv AT leechih lasagnaanovelalgorithmfortranscriptionfactorbindingsitealignment
AT huangchunhsi lasagnaanovelalgorithmfortranscriptionfactorbindingsitealignment