Cargando…
dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies
BACKGROUND: Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6737619/ https://www.ncbi.nlm.nih.gov/pubmed/31510940 http://dx.doi.org/10.1186/s12864-019-6070-x |
_version_ | 1783450690307227648 |
---|---|
author | Yavas, Gokhan Hong, Huixiao Xiao, Wenming |
author_facet | Yavas, Gokhan Hong, Huixiao Xiao, Wenming |
author_sort | Yavas, Gokhan |
collection | PubMed |
description | BACKGROUND: Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have been proposed and software tools for calculating those metrics have been developed, the existing tools do not produce a unified measure to reflect the overall quality of an assembly. RESULTS: To address this issue, we developed the de novo Assembly Quality Evaluation Tool (dnAQET) that generates a unified metric for benchmarking the quality assessment of assemblies. Our framework first calculates individual quality scores for the scaffolds/contigs of an assembly by aligning them to a reference genome. Next, it computes a quality score for the assembly using its overall reference genome coverage, the quality score distribution of its scaffolds and the redundancy identified in it. Using synthetic assemblies randomly generated from the latest human genome build, various builds of the reference genomes for five organisms and six de novo assemblies for sample NA24385, we tested dnAQET to assess its capability for benchmarking quality evaluation of genome assemblies. For synthetic data, our quality score increased with decreasing number of misassemblies and redundancy and increasing average contig length and coverage, as expected. For genome builds, dnAQET quality score calculated for a more recent reference genome was better than the score for an older version. To compare with some of the most frequently used measures, 13 other quality measures were calculated. The quality score from dnAQET was found to be better than all other measures in terms of consistency with the known quality of the reference genomes, indicating that dnAQET is reliable for benchmarking quality assessment of de novo genome assemblies. CONCLUSIONS: The dnAQET is a scalable framework designed to evaluate a de novo genome assembly based on the aggregated quality of its scaffolds (or contigs). Our results demonstrated that dnAQET quality score is reliable for benchmarking quality assessment of genome assemblies. The dnQAET can help researchers to identify the most suitable assembly tools and to select high quality assemblies generated. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-6070-x) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6737619 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-67376192019-09-16 dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies Yavas, Gokhan Hong, Huixiao Xiao, Wenming BMC Genomics Methodology Article BACKGROUND: Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have been proposed and software tools for calculating those metrics have been developed, the existing tools do not produce a unified measure to reflect the overall quality of an assembly. RESULTS: To address this issue, we developed the de novo Assembly Quality Evaluation Tool (dnAQET) that generates a unified metric for benchmarking the quality assessment of assemblies. Our framework first calculates individual quality scores for the scaffolds/contigs of an assembly by aligning them to a reference genome. Next, it computes a quality score for the assembly using its overall reference genome coverage, the quality score distribution of its scaffolds and the redundancy identified in it. Using synthetic assemblies randomly generated from the latest human genome build, various builds of the reference genomes for five organisms and six de novo assemblies for sample NA24385, we tested dnAQET to assess its capability for benchmarking quality evaluation of genome assemblies. For synthetic data, our quality score increased with decreasing number of misassemblies and redundancy and increasing average contig length and coverage, as expected. For genome builds, dnAQET quality score calculated for a more recent reference genome was better than the score for an older version. To compare with some of the most frequently used measures, 13 other quality measures were calculated. The quality score from dnAQET was found to be better than all other measures in terms of consistency with the known quality of the reference genomes, indicating that dnAQET is reliable for benchmarking quality assessment of de novo genome assemblies. CONCLUSIONS: The dnAQET is a scalable framework designed to evaluate a de novo genome assembly based on the aggregated quality of its scaffolds (or contigs). Our results demonstrated that dnAQET quality score is reliable for benchmarking quality assessment of genome assemblies. The dnQAET can help researchers to identify the most suitable assembly tools and to select high quality assemblies generated. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-6070-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-09-11 /pmc/articles/PMC6737619/ /pubmed/31510940 http://dx.doi.org/10.1186/s12864-019-6070-x Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Yavas, Gokhan Hong, Huixiao Xiao, Wenming dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies |
title | dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies |
title_full | dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies |
title_fullStr | dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies |
title_full_unstemmed | dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies |
title_short | dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies |
title_sort | dnaqet: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6737619/ https://www.ncbi.nlm.nih.gov/pubmed/31510940 http://dx.doi.org/10.1186/s12864-019-6070-x |
work_keys_str_mv | AT yavasgokhan dnaqetaframeworktocomputeaconsolidatedmetricforbenchmarkingqualityofdenovoassemblies AT honghuixiao dnaqetaframeworktocomputeaconsolidatedmetricforbenchmarkingqualityofdenovoassemblies AT xiaowenming dnaqetaframeworktocomputeaconsolidatedmetricforbenchmarkingqualityofdenovoassemblies |