Cargando…
Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers
Current microarray technologies to determine RNA structure or measure protein–RNA interactions rely on single-stranded, unstructured RNA probes on a chip covering together all k-mers. Since space on the array is limited, the problem is to efficiently design a compact library of unstructured ℓ-long R...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Mary Ann Liebert, Inc.
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4752187/ https://www.ncbi.nlm.nih.gov/pubmed/26713687 http://dx.doi.org/10.1089/cmb.2015.0179 |
_version_ | 1782415687548928000 |
---|---|
author | Orenstein, Yaron Berger, Bonnie |
author_facet | Orenstein, Yaron Berger, Bonnie |
author_sort | Orenstein, Yaron |
collection | PubMed |
description | Current microarray technologies to determine RNA structure or measure protein–RNA interactions rely on single-stranded, unstructured RNA probes on a chip covering together all k-mers. Since space on the array is limited, the problem is to efficiently design a compact library of unstructured ℓ-long RNA probes, where each k-mer is covered at least p times. Ray et al. designed such a library for specific values of k, ℓ, and p using ad-hoc rules. To our knowledge, there is no general method to date to solve this problem. Here, we address the problem of finding a minimum-size covering of all k-mers by ℓ-long sequences with the desired properties for any value of k, ℓ, and p. As we prove that the problem is NP-hard, we give two solutions: the first is a greedy algorithm with a logarithmic approximation ratio; the second, a heuristic greedy approach based on random walks in de Bruijn graphs. The heuristic algorithm works well in practice and produces a library of unstructured RNA probes that is only ∼1.1-times greater in size compared to the theoretical lower bound. We present results for typical values of k and probe lengths ℓ and show that our algorithm generates a library that is significantly smaller than the library of Ray et al.; moreover, we show that our algorithm outperforms naive methods. Our approach can be generalized and extended to generate RNA or DNA oligo libraries with other desired properties. The software is freely available online. |
format | Online Article Text |
id | pubmed-4752187 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Mary Ann Liebert, Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-47521872016-02-23 Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers Orenstein, Yaron Berger, Bonnie J Comput Biol Research Articles Current microarray technologies to determine RNA structure or measure protein–RNA interactions rely on single-stranded, unstructured RNA probes on a chip covering together all k-mers. Since space on the array is limited, the problem is to efficiently design a compact library of unstructured ℓ-long RNA probes, where each k-mer is covered at least p times. Ray et al. designed such a library for specific values of k, ℓ, and p using ad-hoc rules. To our knowledge, there is no general method to date to solve this problem. Here, we address the problem of finding a minimum-size covering of all k-mers by ℓ-long sequences with the desired properties for any value of k, ℓ, and p. As we prove that the problem is NP-hard, we give two solutions: the first is a greedy algorithm with a logarithmic approximation ratio; the second, a heuristic greedy approach based on random walks in de Bruijn graphs. The heuristic algorithm works well in practice and produces a library of unstructured RNA probes that is only ∼1.1-times greater in size compared to the theoretical lower bound. We present results for typical values of k and probe lengths ℓ and show that our algorithm generates a library that is significantly smaller than the library of Ray et al.; moreover, we show that our algorithm outperforms naive methods. Our approach can be generalized and extended to generate RNA or DNA oligo libraries with other desired properties. The software is freely available online. Mary Ann Liebert, Inc. 2016-02-01 /pmc/articles/PMC4752187/ /pubmed/26713687 http://dx.doi.org/10.1089/cmb.2015.0179 Text en © The Author(s) 2015; Published by Mary Ann Liebert, Inc. This Open Access article is distributed under the terms of the Creative Commons License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. |
spellingShingle | Research Articles Orenstein, Yaron Berger, Bonnie Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers |
title | Efficient Design of Compact Unstructured RNA Libraries Covering All
k-mers |
title_full | Efficient Design of Compact Unstructured RNA Libraries Covering All
k-mers |
title_fullStr | Efficient Design of Compact Unstructured RNA Libraries Covering All
k-mers |
title_full_unstemmed | Efficient Design of Compact Unstructured RNA Libraries Covering All
k-mers |
title_short | Efficient Design of Compact Unstructured RNA Libraries Covering All
k-mers |
title_sort | efficient design of compact unstructured rna libraries covering all
k-mers |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4752187/ https://www.ncbi.nlm.nih.gov/pubmed/26713687 http://dx.doi.org/10.1089/cmb.2015.0179 |
work_keys_str_mv | AT orensteinyaron efficientdesignofcompactunstructuredrnalibrariescoveringallkmers AT bergerbonnie efficientdesignofcompactunstructuredrnalibrariescoveringallkmers |