Cargando…

Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage

Synchronization (insertions–deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved...

Descripción completa

Detalles Bibliográficos
Autores principales: Xie, Ranze, Zan, Xiangzhen, Chu, Ling, Su, Yanqing, Xu, Peng, Liu, Wenbin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10037887/
https://www.ncbi.nlm.nih.gov/pubmed/36959531
http://dx.doi.org/10.1186/s12859-023-05237-9
_version_ 1784911970278834176
author Xie, Ranze
Zan, Xiangzhen
Chu, Ling
Su, Yanqing
Xu, Peng
Liu, Wenbin
author_facet Xie, Ranze
Zan, Xiangzhen
Chu, Ling
Su, Yanqing
Xu, Peng
Liu, Wenbin
author_sort Xie, Ranze
collection PubMed
description Synchronization (insertions–deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved subsequences. In this paper, we conduct a comprehensive simulation study on the error correction capability of a typical MSA algorithm, MAFFT. Our results reveal that its capability exhibits a phase transition when there are around 20% errors. Below this critical value, increasing sequencing depth can eventually allow it to approach complete recovery. Otherwise, its performance plateaus at some poor levels. Given a reasonable sequencing depth (≤ 70), MSA could achieve complete recovery in the low error regime, and effectively correct 90% of the errors in the medium error regime. In addition, MSA is robust to imperfect clustering. It could also be combined with other means such as ECC, repeated markers, or any other code constraints. Furthermore, by selecting an appropriate sequencing depth, this strategy could achieve an optimal trade-off between cost and reading speed. MSA could be a competitive alternative for future DNA storage.
format Online
Article
Text
id pubmed-10037887
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-100378872023-03-25 Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage Xie, Ranze Zan, Xiangzhen Chu, Ling Su, Yanqing Xu, Peng Liu, Wenbin BMC Bioinformatics Research Synchronization (insertions–deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved subsequences. In this paper, we conduct a comprehensive simulation study on the error correction capability of a typical MSA algorithm, MAFFT. Our results reveal that its capability exhibits a phase transition when there are around 20% errors. Below this critical value, increasing sequencing depth can eventually allow it to approach complete recovery. Otherwise, its performance plateaus at some poor levels. Given a reasonable sequencing depth (≤ 70), MSA could achieve complete recovery in the low error regime, and effectively correct 90% of the errors in the medium error regime. In addition, MSA is robust to imperfect clustering. It could also be combined with other means such as ECC, repeated markers, or any other code constraints. Furthermore, by selecting an appropriate sequencing depth, this strategy could achieve an optimal trade-off between cost and reading speed. MSA could be a competitive alternative for future DNA storage. BioMed Central 2023-03-23 /pmc/articles/PMC10037887/ /pubmed/36959531 http://dx.doi.org/10.1186/s12859-023-05237-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Xie, Ranze
Zan, Xiangzhen
Chu, Ling
Su, Yanqing
Xu, Peng
Liu, Wenbin
Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage
title Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage
title_full Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage
title_fullStr Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage
title_full_unstemmed Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage
title_short Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage
title_sort study of the error correction capability of multiple sequence alignment algorithm (mafft) in dna storage
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10037887/
https://www.ncbi.nlm.nih.gov/pubmed/36959531
http://dx.doi.org/10.1186/s12859-023-05237-9
work_keys_str_mv AT xieranze studyoftheerrorcorrectioncapabilityofmultiplesequencealignmentalgorithmmafftindnastorage
AT zanxiangzhen studyoftheerrorcorrectioncapabilityofmultiplesequencealignmentalgorithmmafftindnastorage
AT chuling studyoftheerrorcorrectioncapabilityofmultiplesequencealignmentalgorithmmafftindnastorage
AT suyanqing studyoftheerrorcorrectioncapabilityofmultiplesequencealignmentalgorithmmafftindnastorage
AT xupeng studyoftheerrorcorrectioncapabilityofmultiplesequencealignmentalgorithmmafftindnastorage
AT liuwenbin studyoftheerrorcorrectioncapabilityofmultiplesequencealignmentalgorithmmafftindnastorage