Cargando…

MAC: Merging Assemblies by Using Adjacency Algebraic Model and Classification

With the generation of a large amount of sequencing data, different assemblers have emerged to perform de novo genome assembly. As a single strategy is hard to fit various biases of datasets, none of these tools outperforms the others on all species. The process of assembly reconciliation is to merg...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Li, Li, Min, Wu, Fang-Xiang, Pan, Yi, Wang, Jianxin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7005248/
https://www.ncbi.nlm.nih.gov/pubmed/32082361
http://dx.doi.org/10.3389/fgene.2019.01396
_version_ 1783494894578302976
author Tang, Li
Li, Min
Wu, Fang-Xiang
Pan, Yi
Wang, Jianxin
author_facet Tang, Li
Li, Min
Wu, Fang-Xiang
Pan, Yi
Wang, Jianxin
author_sort Tang, Li
collection PubMed
description With the generation of a large amount of sequencing data, different assemblers have emerged to perform de novo genome assembly. As a single strategy is hard to fit various biases of datasets, none of these tools outperforms the others on all species. The process of assembly reconciliation is to merge multiple assemblies and generate a high-quality consensus assembly. Several assembly reconciliation tools have been proposed. However, the existing reconciliation tools cannot produce a merged assembly which has better contiguity and contains less errors simultaneously, and the results of these tools usually depend on the ranking of input assemblies. In this study, we propose a novel assembly reconciliation tool MAC, which merges assemblies by using the adjacency algebraic model and classification. In order to solve the problem of uneven sequencing depth and sequencing errors, MAC identifies consensus blocks between contig sets to construct an adjacency graph. To solve the problem of repetitive region, MAC employs classification to optimize the adjacency algebraic model. What’s more, MAC designs an overall scoring function to solve the problem of unknown ranking of input assembly sets. The experimental results from four species of GAGE-B demonstrate that MAC outperforms other assembly reconciliation tools.
format Online
Article
Text
id pubmed-7005248
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-70052482020-02-20 MAC: Merging Assemblies by Using Adjacency Algebraic Model and Classification Tang, Li Li, Min Wu, Fang-Xiang Pan, Yi Wang, Jianxin Front Genet Genetics With the generation of a large amount of sequencing data, different assemblers have emerged to perform de novo genome assembly. As a single strategy is hard to fit various biases of datasets, none of these tools outperforms the others on all species. The process of assembly reconciliation is to merge multiple assemblies and generate a high-quality consensus assembly. Several assembly reconciliation tools have been proposed. However, the existing reconciliation tools cannot produce a merged assembly which has better contiguity and contains less errors simultaneously, and the results of these tools usually depend on the ranking of input assemblies. In this study, we propose a novel assembly reconciliation tool MAC, which merges assemblies by using the adjacency algebraic model and classification. In order to solve the problem of uneven sequencing depth and sequencing errors, MAC identifies consensus blocks between contig sets to construct an adjacency graph. To solve the problem of repetitive region, MAC employs classification to optimize the adjacency algebraic model. What’s more, MAC designs an overall scoring function to solve the problem of unknown ranking of input assembly sets. The experimental results from four species of GAGE-B demonstrate that MAC outperforms other assembly reconciliation tools. Frontiers Media S.A. 2020-01-31 /pmc/articles/PMC7005248/ /pubmed/32082361 http://dx.doi.org/10.3389/fgene.2019.01396 Text en Copyright © 2020 Tang, Li, Wu, Pan and Wang http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Tang, Li
Li, Min
Wu, Fang-Xiang
Pan, Yi
Wang, Jianxin
MAC: Merging Assemblies by Using Adjacency Algebraic Model and Classification
title MAC: Merging Assemblies by Using Adjacency Algebraic Model and Classification
title_full MAC: Merging Assemblies by Using Adjacency Algebraic Model and Classification
title_fullStr MAC: Merging Assemblies by Using Adjacency Algebraic Model and Classification
title_full_unstemmed MAC: Merging Assemblies by Using Adjacency Algebraic Model and Classification
title_short MAC: Merging Assemblies by Using Adjacency Algebraic Model and Classification
title_sort mac: merging assemblies by using adjacency algebraic model and classification
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7005248/
https://www.ncbi.nlm.nih.gov/pubmed/32082361
http://dx.doi.org/10.3389/fgene.2019.01396
work_keys_str_mv AT tangli macmergingassembliesbyusingadjacencyalgebraicmodelandclassification
AT limin macmergingassembliesbyusingadjacencyalgebraicmodelandclassification
AT wufangxiang macmergingassembliesbyusingadjacencyalgebraicmodelandclassification
AT panyi macmergingassembliesbyusingadjacencyalgebraicmodelandclassification
AT wangjianxin macmergingassembliesbyusingadjacencyalgebraicmodelandclassification