Cargando…

Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers

Motivation: Novel technologies can generate large sets of short double-stranded DNA sequences that can be used to measure their regulatory effects. Microarrays can measure in vitro the binding intensity of a protein to thousands of probes. Synthetic enhancer sequences inserted into an organism’s gen...

Descripción completa

Detalles Bibliográficos
Autores principales: Orenstein, Yaron, Shamir, Ron
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694677/
https://www.ncbi.nlm.nih.gov/pubmed/23813011
http://dx.doi.org/10.1093/bioinformatics/btt230
_version_ 1782274887251918848
author Orenstein, Yaron
Shamir, Ron
author_facet Orenstein, Yaron
Shamir, Ron
author_sort Orenstein, Yaron
collection PubMed
description Motivation: Novel technologies can generate large sets of short double-stranded DNA sequences that can be used to measure their regulatory effects. Microarrays can measure in vitro the binding intensity of a protein to thousands of probes. Synthetic enhancer sequences inserted into an organism’s genome allow us to measure in vivo the effect of such sequences on the phenotype. In both applications, by using sequence probes that cover all k-mers, a comprehensive picture of the effect of all possible short sequences on gene regulation is obtained. The value of k that can be used in practice is, however, severely limited by cost and space considerations. A key challenge is, therefore, to cover all k-mers with a minimal number of probes. The standard way to do this uses the de Bruijn sequence of length [Image: see text]. However, as probes are double stranded, when a k-mer is included in a probe, its reverse complement k-mer is accounted for as well. Results: Here, we show how to efficiently create a shortest possible sequence with the property that it contains each k-mer or its reverse complement, but not necessarily both. The length of the resulting sequence approaches half that of the de Bruijn sequence as k increases resulting in a more efficient array, which allows covering more longer sequences; alternatively, additional sequences with redundant k-mers of interest can be added. Availability: The software is freely available from our website http://acgt.cs.tau.ac.il/shortcake/. Contact: rshamir@tau.ac.il
format Online
Article
Text
id pubmed-3694677
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36946772013-06-27 Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers Orenstein, Yaron Shamir, Ron Bioinformatics Ismb/Eccb 2013 Proceedings Papers Committee July 21 to July 23, 2013, Berlin, Germany Motivation: Novel technologies can generate large sets of short double-stranded DNA sequences that can be used to measure their regulatory effects. Microarrays can measure in vitro the binding intensity of a protein to thousands of probes. Synthetic enhancer sequences inserted into an organism’s genome allow us to measure in vivo the effect of such sequences on the phenotype. In both applications, by using sequence probes that cover all k-mers, a comprehensive picture of the effect of all possible short sequences on gene regulation is obtained. The value of k that can be used in practice is, however, severely limited by cost and space considerations. A key challenge is, therefore, to cover all k-mers with a minimal number of probes. The standard way to do this uses the de Bruijn sequence of length [Image: see text]. However, as probes are double stranded, when a k-mer is included in a probe, its reverse complement k-mer is accounted for as well. Results: Here, we show how to efficiently create a shortest possible sequence with the property that it contains each k-mer or its reverse complement, but not necessarily both. The length of the resulting sequence approaches half that of the de Bruijn sequence as k increases resulting in a more efficient array, which allows covering more longer sequences; alternatively, additional sequences with redundant k-mers of interest can be added. Availability: The software is freely available from our website http://acgt.cs.tau.ac.il/shortcake/. Contact: rshamir@tau.ac.il Oxford University Press 2013-07-01 2013-06-19 /pmc/articles/PMC3694677/ /pubmed/23813011 http://dx.doi.org/10.1093/bioinformatics/btt230 Text en © The Author 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2013 Proceedings Papers Committee July 21 to July 23, 2013, Berlin, Germany
Orenstein, Yaron
Shamir, Ron
Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers
title Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers
title_full Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers
title_fullStr Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers
title_full_unstemmed Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers
title_short Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers
title_sort design of shortest double-stranded dna sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers
topic Ismb/Eccb 2013 Proceedings Papers Committee July 21 to July 23, 2013, Berlin, Germany
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694677/
https://www.ncbi.nlm.nih.gov/pubmed/23813011
http://dx.doi.org/10.1093/bioinformatics/btt230
work_keys_str_mv AT orensteinyaron designofshortestdoublestrandeddnasequencescoveringallkmerswithapplicationstoproteinbindingmicroarraysandsyntheticenhancers
AT shamirron designofshortestdoublestrandeddnasequencescoveringallkmerswithapplicationstoproteinbindingmicroarraysandsyntheticenhancers