Cargando…

Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads

MOTIVATION: DNA-based data storage is one of the most attractive research areas for future archival storage. However, it faces the problems of high writing and reading costs for practical use. There have been many efforts to resolve this problem, but existing schemes are not fully suitable for DNA-b...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, Seong-Joon, Kim, Sunghwan, Jeong, Jaeho, No, Albert, No, Jong-Seon, Park, Hosung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10500082/
https://www.ncbi.nlm.nih.gov/pubmed/37669160
http://dx.doi.org/10.1093/bioinformatics/btad548
_version_ 1785105848744280064
author Park, Seong-Joon
Kim, Sunghwan
Jeong, Jaeho
No, Albert
No, Jong-Seon
Park, Hosung
author_facet Park, Seong-Joon
Kim, Sunghwan
Jeong, Jaeho
No, Albert
No, Jong-Seon
Park, Hosung
author_sort Park, Seong-Joon
collection PubMed
description MOTIVATION: DNA-based data storage is one of the most attractive research areas for future archival storage. However, it faces the problems of high writing and reading costs for practical use. There have been many efforts to resolve this problem, but existing schemes are not fully suitable for DNA-based data storage, and more cost reduction is needed. RESULTS: We propose whole encoding and decoding procedures for DNA storage. The encoding procedure consists of a carefully designed single low-density parity-check code as an inter-oligo code, which corrects errors and dropouts efficiently. We apply new clustering and alignment methods that operate on variable-length reads to aid the decoding performance. We use edit distance and quality scores during the sequence analysis-aided decoding procedure, which can discard abnormal reads and utilize high-quality soft information. We store 548.83 KB of an image file in DNA oligos and achieve a writing cost reduction of 7.46% and a significant reading cost reduction of 26.57% and 19.41% compared with the two previous works. AVAILABILITY AND IMPLEMENTATION: Data and codes for all the algorithms proposed in this study are available at: https://github.com/sjpark0905/DNA-LDPC-codes.
format Online
Article
Text
id pubmed-10500082
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105000822023-09-15 Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads Park, Seong-Joon Kim, Sunghwan Jeong, Jaeho No, Albert No, Jong-Seon Park, Hosung Bioinformatics Original Paper MOTIVATION: DNA-based data storage is one of the most attractive research areas for future archival storage. However, it faces the problems of high writing and reading costs for practical use. There have been many efforts to resolve this problem, but existing schemes are not fully suitable for DNA-based data storage, and more cost reduction is needed. RESULTS: We propose whole encoding and decoding procedures for DNA storage. The encoding procedure consists of a carefully designed single low-density parity-check code as an inter-oligo code, which corrects errors and dropouts efficiently. We apply new clustering and alignment methods that operate on variable-length reads to aid the decoding performance. We use edit distance and quality scores during the sequence analysis-aided decoding procedure, which can discard abnormal reads and utilize high-quality soft information. We store 548.83 KB of an image file in DNA oligos and achieve a writing cost reduction of 7.46% and a significant reading cost reduction of 26.57% and 19.41% compared with the two previous works. AVAILABILITY AND IMPLEMENTATION: Data and codes for all the algorithms proposed in this study are available at: https://github.com/sjpark0905/DNA-LDPC-codes. Oxford University Press 2023-09-05 /pmc/articles/PMC10500082/ /pubmed/37669160 http://dx.doi.org/10.1093/bioinformatics/btad548 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Park, Seong-Joon
Kim, Sunghwan
Jeong, Jaeho
No, Albert
No, Jong-Seon
Park, Hosung
Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads
title Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads
title_full Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads
title_fullStr Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads
title_full_unstemmed Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads
title_short Reducing cost in DNA-based data storage by sequence analysis-aided soft information decoding of variable-length reads
title_sort reducing cost in dna-based data storage by sequence analysis-aided soft information decoding of variable-length reads
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10500082/
https://www.ncbi.nlm.nih.gov/pubmed/37669160
http://dx.doi.org/10.1093/bioinformatics/btad548
work_keys_str_mv AT parkseongjoon reducingcostindnabaseddatastoragebysequenceanalysisaidedsoftinformationdecodingofvariablelengthreads
AT kimsunghwan reducingcostindnabaseddatastoragebysequenceanalysisaidedsoftinformationdecodingofvariablelengthreads
AT jeongjaeho reducingcostindnabaseddatastoragebysequenceanalysisaidedsoftinformationdecodingofvariablelengthreads
AT noalbert reducingcostindnabaseddatastoragebysequenceanalysisaidedsoftinformationdecodingofvariablelengthreads
AT nojongseon reducingcostindnabaseddatastoragebysequenceanalysisaidedsoftinformationdecodingofvariablelengthreads
AT parkhosung reducingcostindnabaseddatastoragebysequenceanalysisaidedsoftinformationdecodingofvariablelengthreads