Cargando…

dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies

BACKGROUND: Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have...

Descripción completa

Detalles Bibliográficos
Autores principales: Yavas, Gokhan, Hong, Huixiao, Xiao, Wenming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6737619/
https://www.ncbi.nlm.nih.gov/pubmed/31510940
http://dx.doi.org/10.1186/s12864-019-6070-x
_version_ 1783450690307227648
author Yavas, Gokhan
Hong, Huixiao
Xiao, Wenming
author_facet Yavas, Gokhan
Hong, Huixiao
Xiao, Wenming
author_sort Yavas, Gokhan
collection PubMed
description BACKGROUND: Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have been proposed and software tools for calculating those metrics have been developed, the existing tools do not produce a unified measure to reflect the overall quality of an assembly. RESULTS: To address this issue, we developed the de novo Assembly Quality Evaluation Tool (dnAQET) that generates a unified metric for benchmarking the quality assessment of assemblies. Our framework first calculates individual quality scores for the scaffolds/contigs of an assembly by aligning them to a reference genome. Next, it computes a quality score for the assembly using its overall reference genome coverage, the quality score distribution of its scaffolds and the redundancy identified in it. Using synthetic assemblies randomly generated from the latest human genome build, various builds of the reference genomes for five organisms and six de novo assemblies for sample NA24385, we tested dnAQET to assess its capability for benchmarking quality evaluation of genome assemblies. For synthetic data, our quality score increased with decreasing number of misassemblies and redundancy and increasing average contig length and coverage, as expected. For genome builds, dnAQET quality score calculated for a more recent reference genome was better than the score for an older version. To compare with some of the most frequently used measures, 13 other quality measures were calculated. The quality score from dnAQET was found to be better than all other measures in terms of consistency with the known quality of the reference genomes, indicating that dnAQET is reliable for benchmarking quality assessment of de novo genome assemblies. CONCLUSIONS: The dnAQET is a scalable framework designed to evaluate a de novo genome assembly based on the aggregated quality of its scaffolds (or contigs). Our results demonstrated that dnAQET quality score is reliable for benchmarking quality assessment of genome assemblies. The dnQAET can help researchers to identify the most suitable assembly tools and to select high quality assemblies generated. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-6070-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6737619
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67376192019-09-16 dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies Yavas, Gokhan Hong, Huixiao Xiao, Wenming BMC Genomics Methodology Article BACKGROUND: Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have been proposed and software tools for calculating those metrics have been developed, the existing tools do not produce a unified measure to reflect the overall quality of an assembly. RESULTS: To address this issue, we developed the de novo Assembly Quality Evaluation Tool (dnAQET) that generates a unified metric for benchmarking the quality assessment of assemblies. Our framework first calculates individual quality scores for the scaffolds/contigs of an assembly by aligning them to a reference genome. Next, it computes a quality score for the assembly using its overall reference genome coverage, the quality score distribution of its scaffolds and the redundancy identified in it. Using synthetic assemblies randomly generated from the latest human genome build, various builds of the reference genomes for five organisms and six de novo assemblies for sample NA24385, we tested dnAQET to assess its capability for benchmarking quality evaluation of genome assemblies. For synthetic data, our quality score increased with decreasing number of misassemblies and redundancy and increasing average contig length and coverage, as expected. For genome builds, dnAQET quality score calculated for a more recent reference genome was better than the score for an older version. To compare with some of the most frequently used measures, 13 other quality measures were calculated. The quality score from dnAQET was found to be better than all other measures in terms of consistency with the known quality of the reference genomes, indicating that dnAQET is reliable for benchmarking quality assessment of de novo genome assemblies. CONCLUSIONS: The dnAQET is a scalable framework designed to evaluate a de novo genome assembly based on the aggregated quality of its scaffolds (or contigs). Our results demonstrated that dnAQET quality score is reliable for benchmarking quality assessment of genome assemblies. The dnQAET can help researchers to identify the most suitable assembly tools and to select high quality assemblies generated. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-6070-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-09-11 /pmc/articles/PMC6737619/ /pubmed/31510940 http://dx.doi.org/10.1186/s12864-019-6070-x Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Yavas, Gokhan
Hong, Huixiao
Xiao, Wenming
dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies
title dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies
title_full dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies
title_fullStr dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies
title_full_unstemmed dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies
title_short dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies
title_sort dnaqet: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6737619/
https://www.ncbi.nlm.nih.gov/pubmed/31510940
http://dx.doi.org/10.1186/s12864-019-6070-x
work_keys_str_mv AT yavasgokhan dnaqetaframeworktocomputeaconsolidatedmetricforbenchmarkingqualityofdenovoassemblies
AT honghuixiao dnaqetaframeworktocomputeaconsolidatedmetricforbenchmarkingqualityofdenovoassemblies
AT xiaowenming dnaqetaframeworktocomputeaconsolidatedmetricforbenchmarkingqualityofdenovoassemblies