Cargando…

The twilight zone of cis element alignments

Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed ye...

Descripción completa

Detalles Bibliográficos
Autores principales: Sebastian, Alvaro, Contreras-Moreira, Bruno
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3561995/
https://www.ncbi.nlm.nih.gov/pubmed/23268451
http://dx.doi.org/10.1093/nar/gks1301
_version_ 1782258030137573376
author Sebastian, Alvaro
Contreras-Moreira, Bruno
author_facet Sebastian, Alvaro
Contreras-Moreira, Bruno
author_sort Sebastian, Alvaro
collection PubMed
description Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (http://floresta.eead.csic.es/tfcompare), a structural alignment method for protein–DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein–DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments.
format Online
Article
Text
id pubmed-3561995
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-35619952013-02-01 The twilight zone of cis element alignments Sebastian, Alvaro Contreras-Moreira, Bruno Nucleic Acids Res Computational Biology Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (http://floresta.eead.csic.es/tfcompare), a structural alignment method for protein–DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein–DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments. Oxford University Press 2013-02 2012-12-24 /pmc/articles/PMC3561995/ /pubmed/23268451 http://dx.doi.org/10.1093/nar/gks1301 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com.
spellingShingle Computational Biology
Sebastian, Alvaro
Contreras-Moreira, Bruno
The twilight zone of cis element alignments
title The twilight zone of cis element alignments
title_full The twilight zone of cis element alignments
title_fullStr The twilight zone of cis element alignments
title_full_unstemmed The twilight zone of cis element alignments
title_short The twilight zone of cis element alignments
title_sort twilight zone of cis element alignments
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3561995/
https://www.ncbi.nlm.nih.gov/pubmed/23268451
http://dx.doi.org/10.1093/nar/gks1301
work_keys_str_mv AT sebastianalvaro thetwilightzoneofciselementalignments
AT contrerasmoreirabruno thetwilightzoneofciselementalignments
AT sebastianalvaro twilightzoneofciselementalignments
AT contrerasmoreirabruno twilightzoneofciselementalignments