Cargando…

Quantifying molecular bias in DNA data storage

DNA has recently emerged as an attractive medium for archival data storage. Recent work has demonstrated proof-of-principle prototype systems; however, very uneven (biased) sequencing coverage has been reported, which indicates inefficiencies in the storage process. Deviations from the average cover...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Yuan-Jyue, Takahashi, Christopher N., Organick, Lee, Bee, Callista, Ang, Siena Dumas, Weiss, Patrick, Peck, Bill, Seelig, Georg, Ceze, Luis, Strauss, Karin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7324401/
https://www.ncbi.nlm.nih.gov/pubmed/32601272
http://dx.doi.org/10.1038/s41467-020-16958-3
_version_ 1783551934253236224
author Chen, Yuan-Jyue
Takahashi, Christopher N.
Organick, Lee
Bee, Callista
Ang, Siena Dumas
Weiss, Patrick
Peck, Bill
Seelig, Georg
Ceze, Luis
Strauss, Karin
author_facet Chen, Yuan-Jyue
Takahashi, Christopher N.
Organick, Lee
Bee, Callista
Ang, Siena Dumas
Weiss, Patrick
Peck, Bill
Seelig, Georg
Ceze, Luis
Strauss, Karin
author_sort Chen, Yuan-Jyue
collection PubMed
description DNA has recently emerged as an attractive medium for archival data storage. Recent work has demonstrated proof-of-principle prototype systems; however, very uneven (biased) sequencing coverage has been reported, which indicates inefficiencies in the storage process. Deviations from the average coverage in the sequence copy distribution can either cause wasteful provisioning in sequencing or excessive number of missing sequences. Here, we use millions of unique sequences from a DNA-based digital data archival system to study the oligonucleotide copy unevenness problem and show that the two paramount sources of bias are the synthesis and amplification (PCR) processes. Based on these findings, we develop a statistical model for each molecular process as well as the overall process. We further use our model to explore the trade-offs between synthesis bias, storage physical density, logical redundancy, and sequencing redundancy, providing insights for engineering efficient, robust DNA data storage systems.
format Online
Article
Text
id pubmed-7324401
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-73244012020-07-06 Quantifying molecular bias in DNA data storage Chen, Yuan-Jyue Takahashi, Christopher N. Organick, Lee Bee, Callista Ang, Siena Dumas Weiss, Patrick Peck, Bill Seelig, Georg Ceze, Luis Strauss, Karin Nat Commun Article DNA has recently emerged as an attractive medium for archival data storage. Recent work has demonstrated proof-of-principle prototype systems; however, very uneven (biased) sequencing coverage has been reported, which indicates inefficiencies in the storage process. Deviations from the average coverage in the sequence copy distribution can either cause wasteful provisioning in sequencing or excessive number of missing sequences. Here, we use millions of unique sequences from a DNA-based digital data archival system to study the oligonucleotide copy unevenness problem and show that the two paramount sources of bias are the synthesis and amplification (PCR) processes. Based on these findings, we develop a statistical model for each molecular process as well as the overall process. We further use our model to explore the trade-offs between synthesis bias, storage physical density, logical redundancy, and sequencing redundancy, providing insights for engineering efficient, robust DNA data storage systems. Nature Publishing Group UK 2020-06-29 /pmc/articles/PMC7324401/ /pubmed/32601272 http://dx.doi.org/10.1038/s41467-020-16958-3 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Chen, Yuan-Jyue
Takahashi, Christopher N.
Organick, Lee
Bee, Callista
Ang, Siena Dumas
Weiss, Patrick
Peck, Bill
Seelig, Georg
Ceze, Luis
Strauss, Karin
Quantifying molecular bias in DNA data storage
title Quantifying molecular bias in DNA data storage
title_full Quantifying molecular bias in DNA data storage
title_fullStr Quantifying molecular bias in DNA data storage
title_full_unstemmed Quantifying molecular bias in DNA data storage
title_short Quantifying molecular bias in DNA data storage
title_sort quantifying molecular bias in dna data storage
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7324401/
https://www.ncbi.nlm.nih.gov/pubmed/32601272
http://dx.doi.org/10.1038/s41467-020-16958-3
work_keys_str_mv AT chenyuanjyue quantifyingmolecularbiasindnadatastorage
AT takahashichristophern quantifyingmolecularbiasindnadatastorage
AT organicklee quantifyingmolecularbiasindnadatastorage
AT beecallista quantifyingmolecularbiasindnadatastorage
AT angsienadumas quantifyingmolecularbiasindnadatastorage
AT weisspatrick quantifyingmolecularbiasindnadatastorage
AT peckbill quantifyingmolecularbiasindnadatastorage
AT seeliggeorg quantifyingmolecularbiasindnadatastorage
AT cezeluis quantifyingmolecularbiasindnadatastorage
AT strausskarin quantifyingmolecularbiasindnadatastorage