Cargando…

The Essential Component in DNA-Based Information Storage System: Robust Error-Tolerating Module

The size of digital data is ever increasing and is expected to grow to 40,000 EB by 2020, yet the estimated global information storage capacity in 2011 is <300 EB, indicating that most of the data are transient. DNA, as a very stable nano-molecule, is an ideal massive storage device for long-term...

Descripción completa

Detalles Bibliográficos
Autores principales: Yim, Aldrin Kay-Yuen, Yu, Allen Chi-Shing, Li, Jing-Woei, Wong, Ada In-Chun, Loo, Jacky F. C., Chan, King Ming, Kong, S. K., Yip, Kevin Y., Chan, Ting-Fung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4222239/
https://www.ncbi.nlm.nih.gov/pubmed/25414846
http://dx.doi.org/10.3389/fbioe.2014.00049
Descripción
Sumario:The size of digital data is ever increasing and is expected to grow to 40,000 EB by 2020, yet the estimated global information storage capacity in 2011 is <300 EB, indicating that most of the data are transient. DNA, as a very stable nano-molecule, is an ideal massive storage device for long-term data archive. The two most notable illustrations are from Church et al. and Goldman et al., whose approaches are well-optimized for most sequencing platforms – short synthesized DNA fragments without homopolymer. Here, we suggested improvements on error handling methodology that could enable the integration of DNA-based computational process, e.g., algorithms based on self-assembly of DNA. As a proof of concept, a picture of size 438 bytes was encoded to DNA with low-density parity-check error-correction code. We salvaged a significant portion of sequencing reads with mutations generated during DNA synthesis and sequencing and successfully reconstructed the entire picture. A modular-based programing framework – DNAcodec with an eXtensible Markup Language-based data format was also introduced. Our experiments demonstrated the practicability of long DNA message recovery with high error tolerance, which opens the field to biocomputing and synthetic biology.