Cargando…
Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage
Synchronization (insertions–deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10037887/ https://www.ncbi.nlm.nih.gov/pubmed/36959531 http://dx.doi.org/10.1186/s12859-023-05237-9 |
_version_ | 1784911970278834176 |
---|---|
author | Xie, Ranze Zan, Xiangzhen Chu, Ling Su, Yanqing Xu, Peng Liu, Wenbin |
author_facet | Xie, Ranze Zan, Xiangzhen Chu, Ling Su, Yanqing Xu, Peng Liu, Wenbin |
author_sort | Xie, Ranze |
collection | PubMed |
description | Synchronization (insertions–deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved subsequences. In this paper, we conduct a comprehensive simulation study on the error correction capability of a typical MSA algorithm, MAFFT. Our results reveal that its capability exhibits a phase transition when there are around 20% errors. Below this critical value, increasing sequencing depth can eventually allow it to approach complete recovery. Otherwise, its performance plateaus at some poor levels. Given a reasonable sequencing depth (≤ 70), MSA could achieve complete recovery in the low error regime, and effectively correct 90% of the errors in the medium error regime. In addition, MSA is robust to imperfect clustering. It could also be combined with other means such as ECC, repeated markers, or any other code constraints. Furthermore, by selecting an appropriate sequencing depth, this strategy could achieve an optimal trade-off between cost and reading speed. MSA could be a competitive alternative for future DNA storage. |
format | Online Article Text |
id | pubmed-10037887 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-100378872023-03-25 Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage Xie, Ranze Zan, Xiangzhen Chu, Ling Su, Yanqing Xu, Peng Liu, Wenbin BMC Bioinformatics Research Synchronization (insertions–deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved subsequences. In this paper, we conduct a comprehensive simulation study on the error correction capability of a typical MSA algorithm, MAFFT. Our results reveal that its capability exhibits a phase transition when there are around 20% errors. Below this critical value, increasing sequencing depth can eventually allow it to approach complete recovery. Otherwise, its performance plateaus at some poor levels. Given a reasonable sequencing depth (≤ 70), MSA could achieve complete recovery in the low error regime, and effectively correct 90% of the errors in the medium error regime. In addition, MSA is robust to imperfect clustering. It could also be combined with other means such as ECC, repeated markers, or any other code constraints. Furthermore, by selecting an appropriate sequencing depth, this strategy could achieve an optimal trade-off between cost and reading speed. MSA could be a competitive alternative for future DNA storage. BioMed Central 2023-03-23 /pmc/articles/PMC10037887/ /pubmed/36959531 http://dx.doi.org/10.1186/s12859-023-05237-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Xie, Ranze Zan, Xiangzhen Chu, Ling Su, Yanqing Xu, Peng Liu, Wenbin Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage |
title | Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage |
title_full | Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage |
title_fullStr | Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage |
title_full_unstemmed | Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage |
title_short | Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage |
title_sort | study of the error correction capability of multiple sequence alignment algorithm (mafft) in dna storage |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10037887/ https://www.ncbi.nlm.nih.gov/pubmed/36959531 http://dx.doi.org/10.1186/s12859-023-05237-9 |
work_keys_str_mv | AT xieranze studyoftheerrorcorrectioncapabilityofmultiplesequencealignmentalgorithmmafftindnastorage AT zanxiangzhen studyoftheerrorcorrectioncapabilityofmultiplesequencealignmentalgorithmmafftindnastorage AT chuling studyoftheerrorcorrectioncapabilityofmultiplesequencealignmentalgorithmmafftindnastorage AT suyanqing studyoftheerrorcorrectioncapabilityofmultiplesequencealignmentalgorithmmafftindnastorage AT xupeng studyoftheerrorcorrectioncapabilityofmultiplesequencealignmentalgorithmmafftindnastorage AT liuwenbin studyoftheerrorcorrectioncapabilityofmultiplesequencealignmentalgorithmmafftindnastorage |