Cargando…
Quantifying molecular bias in DNA data storage
DNA has recently emerged as an attractive medium for archival data storage. Recent work has demonstrated proof-of-principle prototype systems; however, very uneven (biased) sequencing coverage has been reported, which indicates inefficiencies in the storage process. Deviations from the average cover...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7324401/ https://www.ncbi.nlm.nih.gov/pubmed/32601272 http://dx.doi.org/10.1038/s41467-020-16958-3 |
_version_ | 1783551934253236224 |
---|---|
author | Chen, Yuan-Jyue Takahashi, Christopher N. Organick, Lee Bee, Callista Ang, Siena Dumas Weiss, Patrick Peck, Bill Seelig, Georg Ceze, Luis Strauss, Karin |
author_facet | Chen, Yuan-Jyue Takahashi, Christopher N. Organick, Lee Bee, Callista Ang, Siena Dumas Weiss, Patrick Peck, Bill Seelig, Georg Ceze, Luis Strauss, Karin |
author_sort | Chen, Yuan-Jyue |
collection | PubMed |
description | DNA has recently emerged as an attractive medium for archival data storage. Recent work has demonstrated proof-of-principle prototype systems; however, very uneven (biased) sequencing coverage has been reported, which indicates inefficiencies in the storage process. Deviations from the average coverage in the sequence copy distribution can either cause wasteful provisioning in sequencing or excessive number of missing sequences. Here, we use millions of unique sequences from a DNA-based digital data archival system to study the oligonucleotide copy unevenness problem and show that the two paramount sources of bias are the synthesis and amplification (PCR) processes. Based on these findings, we develop a statistical model for each molecular process as well as the overall process. We further use our model to explore the trade-offs between synthesis bias, storage physical density, logical redundancy, and sequencing redundancy, providing insights for engineering efficient, robust DNA data storage systems. |
format | Online Article Text |
id | pubmed-7324401 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-73244012020-07-06 Quantifying molecular bias in DNA data storage Chen, Yuan-Jyue Takahashi, Christopher N. Organick, Lee Bee, Callista Ang, Siena Dumas Weiss, Patrick Peck, Bill Seelig, Georg Ceze, Luis Strauss, Karin Nat Commun Article DNA has recently emerged as an attractive medium for archival data storage. Recent work has demonstrated proof-of-principle prototype systems; however, very uneven (biased) sequencing coverage has been reported, which indicates inefficiencies in the storage process. Deviations from the average coverage in the sequence copy distribution can either cause wasteful provisioning in sequencing or excessive number of missing sequences. Here, we use millions of unique sequences from a DNA-based digital data archival system to study the oligonucleotide copy unevenness problem and show that the two paramount sources of bias are the synthesis and amplification (PCR) processes. Based on these findings, we develop a statistical model for each molecular process as well as the overall process. We further use our model to explore the trade-offs between synthesis bias, storage physical density, logical redundancy, and sequencing redundancy, providing insights for engineering efficient, robust DNA data storage systems. Nature Publishing Group UK 2020-06-29 /pmc/articles/PMC7324401/ /pubmed/32601272 http://dx.doi.org/10.1038/s41467-020-16958-3 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Chen, Yuan-Jyue Takahashi, Christopher N. Organick, Lee Bee, Callista Ang, Siena Dumas Weiss, Patrick Peck, Bill Seelig, Georg Ceze, Luis Strauss, Karin Quantifying molecular bias in DNA data storage |
title | Quantifying molecular bias in DNA data storage |
title_full | Quantifying molecular bias in DNA data storage |
title_fullStr | Quantifying molecular bias in DNA data storage |
title_full_unstemmed | Quantifying molecular bias in DNA data storage |
title_short | Quantifying molecular bias in DNA data storage |
title_sort | quantifying molecular bias in dna data storage |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7324401/ https://www.ncbi.nlm.nih.gov/pubmed/32601272 http://dx.doi.org/10.1038/s41467-020-16958-3 |
work_keys_str_mv | AT chenyuanjyue quantifyingmolecularbiasindnadatastorage AT takahashichristophern quantifyingmolecularbiasindnadatastorage AT organicklee quantifyingmolecularbiasindnadatastorage AT beecallista quantifyingmolecularbiasindnadatastorage AT angsienadumas quantifyingmolecularbiasindnadatastorage AT weisspatrick quantifyingmolecularbiasindnadatastorage AT peckbill quantifyingmolecularbiasindnadatastorage AT seeliggeorg quantifyingmolecularbiasindnadatastorage AT cezeluis quantifyingmolecularbiasindnadatastorage AT strausskarin quantifyingmolecularbiasindnadatastorage |