Cargando…

High-scale random access on DNA storage systems

Due to the rapid cost decline of synthesizing and sequencing deoxyribonucleic acid (DNA), high information density, and its durability of up to centuries, utilizing DNA as an information storage medium has received the attention of many scientists. State-of-the-art DNA storage systems exploit the hi...

Descripción completa

Detalles Bibliográficos
Autores principales: El-Shaikh, Alex, Welzel, Marius, Heider, Dominik, Seeger, Bernhard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8829907/
https://www.ncbi.nlm.nih.gov/pubmed/35156022
http://dx.doi.org/10.1093/nargab/lqab126
_version_ 1784648165237981184
author El-Shaikh, Alex
Welzel, Marius
Heider, Dominik
Seeger, Bernhard
author_facet El-Shaikh, Alex
Welzel, Marius
Heider, Dominik
Seeger, Bernhard
author_sort El-Shaikh, Alex
collection PubMed
description Due to the rapid cost decline of synthesizing and sequencing deoxyribonucleic acid (DNA), high information density, and its durability of up to centuries, utilizing DNA as an information storage medium has received the attention of many scientists. State-of-the-art DNA storage systems exploit the high capacity of DNA and enable random access (predominantly random reads) by primers, which serve as unique identifiers for directly accessing data. However, primers come with a significant limitation regarding the maximum available number per DNA library. The number of different primers within a library is typically very small (e.g. ≈10). We propose a method to overcome this deficiency and present a general-purpose technique for addressing and directly accessing thousands to potentially millions of different data objects within the same DNA pool. Our approach utilizes a fountain code, sophisticated probe design, and microarray technologies. A key component is locality-sensitive hashing, making checks for dissimilarity among such a large number of probes and data objects feasible.
format Online
Article
Text
id pubmed-8829907
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-88299072022-02-11 High-scale random access on DNA storage systems El-Shaikh, Alex Welzel, Marius Heider, Dominik Seeger, Bernhard NAR Genom Bioinform High Throughput Sequencing Methods Due to the rapid cost decline of synthesizing and sequencing deoxyribonucleic acid (DNA), high information density, and its durability of up to centuries, utilizing DNA as an information storage medium has received the attention of many scientists. State-of-the-art DNA storage systems exploit the high capacity of DNA and enable random access (predominantly random reads) by primers, which serve as unique identifiers for directly accessing data. However, primers come with a significant limitation regarding the maximum available number per DNA library. The number of different primers within a library is typically very small (e.g. ≈10). We propose a method to overcome this deficiency and present a general-purpose technique for addressing and directly accessing thousands to potentially millions of different data objects within the same DNA pool. Our approach utilizes a fountain code, sophisticated probe design, and microarray technologies. A key component is locality-sensitive hashing, making checks for dissimilarity among such a large number of probes and data objects feasible. Oxford University Press 2022-01-14 /pmc/articles/PMC8829907/ /pubmed/35156022 http://dx.doi.org/10.1093/nargab/lqab126 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle High Throughput Sequencing Methods
El-Shaikh, Alex
Welzel, Marius
Heider, Dominik
Seeger, Bernhard
High-scale random access on DNA storage systems
title High-scale random access on DNA storage systems
title_full High-scale random access on DNA storage systems
title_fullStr High-scale random access on DNA storage systems
title_full_unstemmed High-scale random access on DNA storage systems
title_short High-scale random access on DNA storage systems
title_sort high-scale random access on dna storage systems
topic High Throughput Sequencing Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8829907/
https://www.ncbi.nlm.nih.gov/pubmed/35156022
http://dx.doi.org/10.1093/nargab/lqab126
work_keys_str_mv AT elshaikhalex highscalerandomaccessondnastoragesystems
AT welzelmarius highscalerandomaccessondnastoragesystems
AT heiderdominik highscalerandomaccessondnastoragesystems
AT seegerbernhard highscalerandomaccessondnastoragesystems