Cargando…

SaAlign: Multiple DNA/RNA sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array

Multiple DNA/RNA sequence alignment is an important fundamental tool in bioinformatics, especially for phylogenetic tree construction. With DNA-sequencing improvements, the amount of bioinformatics data is constantly increasing, and various tools need to be iterated constantly. Mitochondrial genome...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Ziyuan, Tan, Junjie, Long, Yanling, Liu, Yijia, Lei, Wenyan, Cai, Jing, Yang, Yi, Liu, Zhibin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8976100/
https://www.ncbi.nlm.nih.gov/pubmed/35422971
http://dx.doi.org/10.1016/j.csbj.2022.03.018
_version_ 1784680492366299136
author Wang, Ziyuan
Tan, Junjie
Long, Yanling
Liu, Yijia
Lei, Wenyan
Cai, Jing
Yang, Yi
Liu, Zhibin
author_facet Wang, Ziyuan
Tan, Junjie
Long, Yanling
Liu, Yijia
Lei, Wenyan
Cai, Jing
Yang, Yi
Liu, Zhibin
author_sort Wang, Ziyuan
collection PubMed
description Multiple DNA/RNA sequence alignment is an important fundamental tool in bioinformatics, especially for phylogenetic tree construction. With DNA-sequencing improvements, the amount of bioinformatics data is constantly increasing, and various tools need to be iterated constantly. Mitochondrial genome analyses of multiple individuals and species require bioinformatics software; therefore, their performances need to be optimized. To improve the alignment of ultra-large datasets and ultra-long sequences, we optimized a dynamic programming algorithm using longest common substring methods. Ultra-large test DNA datasets, containing sequences of different lengths, some over 300 kb (kilobase), revealed that the Multiple DNA/RNA Sequence Alignment Tool Based on Suffix Tree (SaAlign) saved time and computational space. It outperformed the existing technical tools, including MAFFT and HAlign-II. For mitochondrial genome datasets having limited numbers of sequences, MAFFT performed the required tasks, but it could not handle ultra-large mitochondrial genome datasets for core dump error. We implement a multiple DNA/RNA sequence alignment tool based on Center Star strategy and use suffix array algorithm to optimize the spatial and time efficiency. Nowadays, whole-genome research and NGS technology are becoming more popular, and it is necessary to save computational resources for laboratories. That software is of great significance in these aspects, especially in the study of the whole-mitochondrial genome of plants.
format Online
Article
Text
id pubmed-8976100
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-89761002022-04-13 SaAlign: Multiple DNA/RNA sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array Wang, Ziyuan Tan, Junjie Long, Yanling Liu, Yijia Lei, Wenyan Cai, Jing Yang, Yi Liu, Zhibin Comput Struct Biotechnol J Research Article Multiple DNA/RNA sequence alignment is an important fundamental tool in bioinformatics, especially for phylogenetic tree construction. With DNA-sequencing improvements, the amount of bioinformatics data is constantly increasing, and various tools need to be iterated constantly. Mitochondrial genome analyses of multiple individuals and species require bioinformatics software; therefore, their performances need to be optimized. To improve the alignment of ultra-large datasets and ultra-long sequences, we optimized a dynamic programming algorithm using longest common substring methods. Ultra-large test DNA datasets, containing sequences of different lengths, some over 300 kb (kilobase), revealed that the Multiple DNA/RNA Sequence Alignment Tool Based on Suffix Tree (SaAlign) saved time and computational space. It outperformed the existing technical tools, including MAFFT and HAlign-II. For mitochondrial genome datasets having limited numbers of sequences, MAFFT performed the required tasks, but it could not handle ultra-large mitochondrial genome datasets for core dump error. We implement a multiple DNA/RNA sequence alignment tool based on Center Star strategy and use suffix array algorithm to optimize the spatial and time efficiency. Nowadays, whole-genome research and NGS technology are becoming more popular, and it is necessary to save computational resources for laboratories. That software is of great significance in these aspects, especially in the study of the whole-mitochondrial genome of plants. Research Network of Computational and Structural Biotechnology 2022-03-21 /pmc/articles/PMC8976100/ /pubmed/35422971 http://dx.doi.org/10.1016/j.csbj.2022.03.018 Text en © 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Wang, Ziyuan
Tan, Junjie
Long, Yanling
Liu, Yijia
Lei, Wenyan
Cai, Jing
Yang, Yi
Liu, Zhibin
SaAlign: Multiple DNA/RNA sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array
title SaAlign: Multiple DNA/RNA sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array
title_full SaAlign: Multiple DNA/RNA sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array
title_fullStr SaAlign: Multiple DNA/RNA sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array
title_full_unstemmed SaAlign: Multiple DNA/RNA sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array
title_short SaAlign: Multiple DNA/RNA sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array
title_sort saalign: multiple dna/rna sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8976100/
https://www.ncbi.nlm.nih.gov/pubmed/35422971
http://dx.doi.org/10.1016/j.csbj.2022.03.018
work_keys_str_mv AT wangziyuan saalignmultiplednarnasequencealignmentandphylogenetictreeconstructiontoolforultralargedatasetsandultralongsequencesbasedonsuffixarray
AT tanjunjie saalignmultiplednarnasequencealignmentandphylogenetictreeconstructiontoolforultralargedatasetsandultralongsequencesbasedonsuffixarray
AT longyanling saalignmultiplednarnasequencealignmentandphylogenetictreeconstructiontoolforultralargedatasetsandultralongsequencesbasedonsuffixarray
AT liuyijia saalignmultiplednarnasequencealignmentandphylogenetictreeconstructiontoolforultralargedatasetsandultralongsequencesbasedonsuffixarray
AT leiwenyan saalignmultiplednarnasequencealignmentandphylogenetictreeconstructiontoolforultralargedatasetsandultralongsequencesbasedonsuffixarray
AT caijing saalignmultiplednarnasequencealignmentandphylogenetictreeconstructiontoolforultralargedatasetsandultralongsequencesbasedonsuffixarray
AT yangyi saalignmultiplednarnasequencealignmentandphylogenetictreeconstructiontoolforultralargedatasetsandultralongsequencesbasedonsuffixarray
AT liuzhibin saalignmultiplednarnasequencealignmentandphylogenetictreeconstructiontoolforultralargedatasetsandultralongsequencesbasedonsuffixarray