Cargando…
Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads
MOTIVATION: DNA-based data storage is one of the most attractive research areas for future archival storage. However, it faces the problems of high writing and reading costs for practical use. There have been many efforts to resolve this problem, but existing schemes are not fully suitable for DNA-b...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10500082/ https://www.ncbi.nlm.nih.gov/pubmed/37669160 http://dx.doi.org/10.1093/bioinformatics/btad548 |
_version_ | 1785105848744280064 |
---|---|
author | Park, Seong-Joon Kim, Sunghwan Jeong, Jaeho No, Albert No, Jong-Seon Park, Hosung |
author_facet | Park, Seong-Joon Kim, Sunghwan Jeong, Jaeho No, Albert No, Jong-Seon Park, Hosung |
author_sort | Park, Seong-Joon |
collection | PubMed |
description | MOTIVATION: DNA-based data storage is one of the most attractive research areas for future archival storage. However, it faces the problems of high writing and reading costs for practical use. There have been many efforts to resolve this problem, but existing schemes are not fully suitable for DNA-based data storage, and more cost reduction is needed. RESULTS: We propose whole encoding and decoding procedures for DNA storage. The encoding procedure consists of a carefully designed single low-density parity-check code as an inter-oligo code, which corrects errors and dropouts efficiently. We apply new clustering and alignment methods that operate on variable-length reads to aid the decoding performance. We use edit distance and quality scores during the sequence analysis-aided decoding procedure, which can discard abnormal reads and utilize high-quality soft information. We store 548.83 KB of an image file in DNA oligos and achieve a writing cost reduction of 7.46% and a significant reading cost reduction of 26.57% and 19.41% compared with the two previous works. AVAILABILITY AND IMPLEMENTATION: Data and codes for all the algorithms proposed in this study are available at: https://github.com/sjpark0905/DNA-LDPC-codes. |
format | Online Article Text |
id | pubmed-10500082 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-105000822023-09-15 Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads Park, Seong-Joon Kim, Sunghwan Jeong, Jaeho No, Albert No, Jong-Seon Park, Hosung Bioinformatics Original Paper MOTIVATION: DNA-based data storage is one of the most attractive research areas for future archival storage. However, it faces the problems of high writing and reading costs for practical use. There have been many efforts to resolve this problem, but existing schemes are not fully suitable for DNA-based data storage, and more cost reduction is needed. RESULTS: We propose whole encoding and decoding procedures for DNA storage. The encoding procedure consists of a carefully designed single low-density parity-check code as an inter-oligo code, which corrects errors and dropouts efficiently. We apply new clustering and alignment methods that operate on variable-length reads to aid the decoding performance. We use edit distance and quality scores during the sequence analysis-aided decoding procedure, which can discard abnormal reads and utilize high-quality soft information. We store 548.83 KB of an image file in DNA oligos and achieve a writing cost reduction of 7.46% and a significant reading cost reduction of 26.57% and 19.41% compared with the two previous works. AVAILABILITY AND IMPLEMENTATION: Data and codes for all the algorithms proposed in this study are available at: https://github.com/sjpark0905/DNA-LDPC-codes. Oxford University Press 2023-09-05 /pmc/articles/PMC10500082/ /pubmed/37669160 http://dx.doi.org/10.1093/bioinformatics/btad548 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Park, Seong-Joon Kim, Sunghwan Jeong, Jaeho No, Albert No, Jong-Seon Park, Hosung Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads |
title | Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads |
title_full | Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads |
title_fullStr | Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads |
title_full_unstemmed | Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads |
title_short | Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads |
title_sort | reducing cost in dna-based data storage by sequence analysis-aided soft information decoding of variable-length reads |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10500082/ https://www.ncbi.nlm.nih.gov/pubmed/37669160 http://dx.doi.org/10.1093/bioinformatics/btad548 |
work_keys_str_mv | AT parkseongjoon reducingcostindnabaseddatastoragebysequenceanalysisaidedsoftinformationdecodingofvariablelengthreads AT kimsunghwan reducingcostindnabaseddatastoragebysequenceanalysisaidedsoftinformationdecodingofvariablelengthreads AT jeongjaeho reducingcostindnabaseddatastoragebysequenceanalysisaidedsoftinformationdecodingofvariablelengthreads AT noalbert reducingcostindnabaseddatastoragebysequenceanalysisaidedsoftinformationdecodingofvariablelengthreads AT nojongseon reducingcostindnabaseddatastoragebysequenceanalysisaidedsoftinformationdecodingofvariablelengthreads AT parkhosung reducingcostindnabaseddatastoragebysequenceanalysisaidedsoftinformationdecodingofvariablelengthreads |