Cargando…

A highly parallel strategy for storage of digital information in living cells

BACKGROUND: Encoding arbitrary digital information in DNA has attracted attention as a potential avenue for large scale and long term data storage. However, in order to enable DNA data storage technologies there needs to be improvements in data storage fidelity (tolerance to mutation), the facility...

Descripción completa

Detalles Bibliográficos
Autores principales: Akhmetov, Azat, Ellington, Andrew D., Marcotte, Edward M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191901/
https://www.ncbi.nlm.nih.gov/pubmed/30333005
http://dx.doi.org/10.1186/s12896-018-0476-4
_version_ 1783363803065352192
author Akhmetov, Azat
Ellington, Andrew D.
Marcotte, Edward M.
author_facet Akhmetov, Azat
Ellington, Andrew D.
Marcotte, Edward M.
author_sort Akhmetov, Azat
collection PubMed
description BACKGROUND: Encoding arbitrary digital information in DNA has attracted attention as a potential avenue for large scale and long term data storage. However, in order to enable DNA data storage technologies there needs to be improvements in data storage fidelity (tolerance to mutation), the facility of writing and reading the data (biases and systematic error arising from synthesis and sequencing), and overall scalability. RESULTS: To this end, we have developed and implemented an encoding scheme that is suitable for detecting and correcting errors that may arise during storage, writing, and reading, such as those arising from nucleotide substitutions, insertions, and deletions. We propose a scheme for parallelized long term storage of encoded sequences that relies on overlaps rather than the address blocks found in previously published work. Using computer simulations, we illustrate the encoding, sequencing, decoding, and recovery of encoded information, ultimately demonstrating the possibility of a successful round-trip read/write. These demonstrations show that in theory a precise control over error tolerance is possible. Even after simulated degradation of DNA, recovery of original data is possible owing to the error correction capabilities built into the encoding strategy. A secondary advantage of our method is that the statistical characteristics (such as repetitiveness and GC-composition) of encoded sequences can also be tailored without sacrificing the overall ability to store large amounts of data. Finally, the combination of the overlap-based partitioning of data with the LZMA compression that is integral to encoding means that the entire sequence must be present for successful decoding. This feature enables inordinately strong encryptions. As a potential application, an encrypted pathogen genome could be distributed and carried by cells without danger of being expressed, and could not even be read out in the absence of the entire DNA consortium. CONCLUSIONS: We have developed a method for DNA encoding, using a significantly different fundamental approach from existing work, which often performs better than alternatives and allows for a great deal of freedom and flexibility of application. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12896-018-0476-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6191901
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61919012018-10-23 A highly parallel strategy for storage of digital information in living cells Akhmetov, Azat Ellington, Andrew D. Marcotte, Edward M. BMC Biotechnol Research Article BACKGROUND: Encoding arbitrary digital information in DNA has attracted attention as a potential avenue for large scale and long term data storage. However, in order to enable DNA data storage technologies there needs to be improvements in data storage fidelity (tolerance to mutation), the facility of writing and reading the data (biases and systematic error arising from synthesis and sequencing), and overall scalability. RESULTS: To this end, we have developed and implemented an encoding scheme that is suitable for detecting and correcting errors that may arise during storage, writing, and reading, such as those arising from nucleotide substitutions, insertions, and deletions. We propose a scheme for parallelized long term storage of encoded sequences that relies on overlaps rather than the address blocks found in previously published work. Using computer simulations, we illustrate the encoding, sequencing, decoding, and recovery of encoded information, ultimately demonstrating the possibility of a successful round-trip read/write. These demonstrations show that in theory a precise control over error tolerance is possible. Even after simulated degradation of DNA, recovery of original data is possible owing to the error correction capabilities built into the encoding strategy. A secondary advantage of our method is that the statistical characteristics (such as repetitiveness and GC-composition) of encoded sequences can also be tailored without sacrificing the overall ability to store large amounts of data. Finally, the combination of the overlap-based partitioning of data with the LZMA compression that is integral to encoding means that the entire sequence must be present for successful decoding. This feature enables inordinately strong encryptions. As a potential application, an encrypted pathogen genome could be distributed and carried by cells without danger of being expressed, and could not even be read out in the absence of the entire DNA consortium. CONCLUSIONS: We have developed a method for DNA encoding, using a significantly different fundamental approach from existing work, which often performs better than alternatives and allows for a great deal of freedom and flexibility of application. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12896-018-0476-4) contains supplementary material, which is available to authorized users. BioMed Central 2018-10-17 /pmc/articles/PMC6191901/ /pubmed/30333005 http://dx.doi.org/10.1186/s12896-018-0476-4 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Akhmetov, Azat
Ellington, Andrew D.
Marcotte, Edward M.
A highly parallel strategy for storage of digital information in living cells
title A highly parallel strategy for storage of digital information in living cells
title_full A highly parallel strategy for storage of digital information in living cells
title_fullStr A highly parallel strategy for storage of digital information in living cells
title_full_unstemmed A highly parallel strategy for storage of digital information in living cells
title_short A highly parallel strategy for storage of digital information in living cells
title_sort highly parallel strategy for storage of digital information in living cells
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191901/
https://www.ncbi.nlm.nih.gov/pubmed/30333005
http://dx.doi.org/10.1186/s12896-018-0476-4
work_keys_str_mv AT akhmetovazat ahighlyparallelstrategyforstorageofdigitalinformationinlivingcells
AT ellingtonandrewd ahighlyparallelstrategyforstorageofdigitalinformationinlivingcells
AT marcotteedwardm ahighlyparallelstrategyforstorageofdigitalinformationinlivingcells
AT akhmetovazat highlyparallelstrategyforstorageofdigitalinformationinlivingcells
AT ellingtonandrewd highlyparallelstrategyforstorageofdigitalinformationinlivingcells
AT marcotteedwardm highlyparallelstrategyforstorageofdigitalinformationinlivingcells