Cargando…

Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers

Current microarray technologies to determine RNA structure or measure protein–RNA interactions rely on single-stranded, unstructured RNA probes on a chip covering together all k-mers. Since space on the array is limited, the problem is to efficiently design a compact library of unstructured ℓ-long R...

Descripción completa

Detalles Bibliográficos
Autores principales: Orenstein, Yaron, Berger, Bonnie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Mary Ann Liebert, Inc. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4752187/
https://www.ncbi.nlm.nih.gov/pubmed/26713687
http://dx.doi.org/10.1089/cmb.2015.0179
_version_ 1782415687548928000
author Orenstein, Yaron
Berger, Bonnie
author_facet Orenstein, Yaron
Berger, Bonnie
author_sort Orenstein, Yaron
collection PubMed
description Current microarray technologies to determine RNA structure or measure protein–RNA interactions rely on single-stranded, unstructured RNA probes on a chip covering together all k-mers. Since space on the array is limited, the problem is to efficiently design a compact library of unstructured ℓ-long RNA probes, where each k-mer is covered at least p times. Ray et al. designed such a library for specific values of k, ℓ, and p using ad-hoc rules. To our knowledge, there is no general method to date to solve this problem. Here, we address the problem of finding a minimum-size covering of all k-mers by ℓ-long sequences with the desired properties for any value of k, ℓ, and p. As we prove that the problem is NP-hard, we give two solutions: the first is a greedy algorithm with a logarithmic approximation ratio; the second, a heuristic greedy approach based on random walks in de Bruijn graphs. The heuristic algorithm works well in practice and produces a library of unstructured RNA probes that is only ∼1.1-times greater in size compared to the theoretical lower bound. We present results for typical values of k and probe lengths ℓ and show that our algorithm generates a library that is significantly smaller than the library of Ray et al.; moreover, we show that our algorithm outperforms naive methods. Our approach can be generalized and extended to generate RNA or DNA oligo libraries with other desired properties. The software is freely available online.
format Online
Article
Text
id pubmed-4752187
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Mary Ann Liebert, Inc.
record_format MEDLINE/PubMed
spelling pubmed-47521872016-02-23 Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers Orenstein, Yaron Berger, Bonnie J Comput Biol Research Articles Current microarray technologies to determine RNA structure or measure protein–RNA interactions rely on single-stranded, unstructured RNA probes on a chip covering together all k-mers. Since space on the array is limited, the problem is to efficiently design a compact library of unstructured ℓ-long RNA probes, where each k-mer is covered at least p times. Ray et al. designed such a library for specific values of k, ℓ, and p using ad-hoc rules. To our knowledge, there is no general method to date to solve this problem. Here, we address the problem of finding a minimum-size covering of all k-mers by ℓ-long sequences with the desired properties for any value of k, ℓ, and p. As we prove that the problem is NP-hard, we give two solutions: the first is a greedy algorithm with a logarithmic approximation ratio; the second, a heuristic greedy approach based on random walks in de Bruijn graphs. The heuristic algorithm works well in practice and produces a library of unstructured RNA probes that is only ∼1.1-times greater in size compared to the theoretical lower bound. We present results for typical values of k and probe lengths ℓ and show that our algorithm generates a library that is significantly smaller than the library of Ray et al.; moreover, we show that our algorithm outperforms naive methods. Our approach can be generalized and extended to generate RNA or DNA oligo libraries with other desired properties. The software is freely available online. Mary Ann Liebert, Inc. 2016-02-01 /pmc/articles/PMC4752187/ /pubmed/26713687 http://dx.doi.org/10.1089/cmb.2015.0179 Text en © The Author(s) 2015; Published by Mary Ann Liebert, Inc. This Open Access article is distributed under the terms of the Creative Commons License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research Articles
Orenstein, Yaron
Berger, Bonnie
Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers
title Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers
title_full Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers
title_fullStr Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers
title_full_unstemmed Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers
title_short Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers
title_sort efficient design of compact unstructured rna libraries covering all k-mers
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4752187/
https://www.ncbi.nlm.nih.gov/pubmed/26713687
http://dx.doi.org/10.1089/cmb.2015.0179
work_keys_str_mv AT orensteinyaron efficientdesignofcompactunstructuredrnalibrariescoveringallkmers
AT bergerbonnie efficientdesignofcompactunstructuredrnalibrariescoveringallkmers