Cargando…

Molecular-level similarity search brings computing to DNA data storage

As global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media. Existing approaches leverage robust error correcting codes and precise molecular mechanisms to reliably retrieve specific files from...

Descripción completa

Detalles Bibliográficos
Autores principales: Bee, Callista, Chen, Yuan-Jyue, Queen, Melissa, Ward, David, Liu, Xiaomeng, Organick, Lee, Seelig, Georg, Strauss, Karin, Ceze, Luis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8346626/
https://www.ncbi.nlm.nih.gov/pubmed/34362913
http://dx.doi.org/10.1038/s41467-021-24991-z
_version_ 1783734917131141120
author Bee, Callista
Chen, Yuan-Jyue
Queen, Melissa
Ward, David
Liu, Xiaomeng
Organick, Lee
Seelig, Georg
Strauss, Karin
Ceze, Luis
author_facet Bee, Callista
Chen, Yuan-Jyue
Queen, Melissa
Ward, David
Liu, Xiaomeng
Organick, Lee
Seelig, Georg
Strauss, Karin
Ceze, Luis
author_sort Bee, Callista
collection PubMed
description As global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media. Existing approaches leverage robust error correcting codes and precise molecular mechanisms to reliably retrieve specific files from large databases. Typically, files are retrieved using a pre-specified key, analogous to a filename. However, these approaches lack the ability to perform more complex computations over the stored data, such as similarity search: e.g., finding images that look similar to an image of interest without prior knowledge of their file names. Here we demonstrate a technique for executing similarity search over a DNA-based database of 1.6 million images. Queries are implemented as hybridization probes, and a key step in our approach was to learn an image-to-sequence encoding ensuring that queries preferentially bind to targets representing visually similar images. Experimental results show that our molecular implementation performs comparably to state-of-the-art in silico algorithms for similarity search.
format Online
Article
Text
id pubmed-8346626
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-83466262021-08-20 Molecular-level similarity search brings computing to DNA data storage Bee, Callista Chen, Yuan-Jyue Queen, Melissa Ward, David Liu, Xiaomeng Organick, Lee Seelig, Georg Strauss, Karin Ceze, Luis Nat Commun Article As global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media. Existing approaches leverage robust error correcting codes and precise molecular mechanisms to reliably retrieve specific files from large databases. Typically, files are retrieved using a pre-specified key, analogous to a filename. However, these approaches lack the ability to perform more complex computations over the stored data, such as similarity search: e.g., finding images that look similar to an image of interest without prior knowledge of their file names. Here we demonstrate a technique for executing similarity search over a DNA-based database of 1.6 million images. Queries are implemented as hybridization probes, and a key step in our approach was to learn an image-to-sequence encoding ensuring that queries preferentially bind to targets representing visually similar images. Experimental results show that our molecular implementation performs comparably to state-of-the-art in silico algorithms for similarity search. Nature Publishing Group UK 2021-08-06 /pmc/articles/PMC8346626/ /pubmed/34362913 http://dx.doi.org/10.1038/s41467-021-24991-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Bee, Callista
Chen, Yuan-Jyue
Queen, Melissa
Ward, David
Liu, Xiaomeng
Organick, Lee
Seelig, Georg
Strauss, Karin
Ceze, Luis
Molecular-level similarity search brings computing to DNA data storage
title Molecular-level similarity search brings computing to DNA data storage
title_full Molecular-level similarity search brings computing to DNA data storage
title_fullStr Molecular-level similarity search brings computing to DNA data storage
title_full_unstemmed Molecular-level similarity search brings computing to DNA data storage
title_short Molecular-level similarity search brings computing to DNA data storage
title_sort molecular-level similarity search brings computing to dna data storage
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8346626/
https://www.ncbi.nlm.nih.gov/pubmed/34362913
http://dx.doi.org/10.1038/s41467-021-24991-z
work_keys_str_mv AT beecallista molecularlevelsimilaritysearchbringscomputingtodnadatastorage
AT chenyuanjyue molecularlevelsimilaritysearchbringscomputingtodnadatastorage
AT queenmelissa molecularlevelsimilaritysearchbringscomputingtodnadatastorage
AT warddavid molecularlevelsimilaritysearchbringscomputingtodnadatastorage
AT liuxiaomeng molecularlevelsimilaritysearchbringscomputingtodnadatastorage
AT organicklee molecularlevelsimilaritysearchbringscomputingtodnadatastorage
AT seeliggeorg molecularlevelsimilaritysearchbringscomputingtodnadatastorage
AT strausskarin molecularlevelsimilaritysearchbringscomputingtodnadatastorage
AT cezeluis molecularlevelsimilaritysearchbringscomputingtodnadatastorage