Cargando…

Normalized N50 assembly metric using gap-restricted co-linear chaining

BACKGROUND: For the development of genome assembly tools, some comprehensive and efficiently computable validation measures are required to assess the quality of the assembly. The mostly used N50 measure summarizes the assembly results by the length of the scaffold (or contig) overlapping the midpoi...

Descripción completa

Detalles Bibliográficos
Autores principales: Mäkinen, Veli, Salmela, Leena, Ylinen, Johannes
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3556137/
https://www.ncbi.nlm.nih.gov/pubmed/23031320
http://dx.doi.org/10.1186/1471-2105-13-255
_version_ 1782257153435762688
author Mäkinen, Veli
Salmela, Leena
Ylinen, Johannes
author_facet Mäkinen, Veli
Salmela, Leena
Ylinen, Johannes
author_sort Mäkinen, Veli
collection PubMed
description BACKGROUND: For the development of genome assembly tools, some comprehensive and efficiently computable validation measures are required to assess the quality of the assembly. The mostly used N50 measure summarizes the assembly results by the length of the scaffold (or contig) overlapping the midpoint of the length-order concatenation of scaffolds (contigs). Especially for scaffold assemblies it is non-trivial to combine a correctness measure to the N50 values, and the current methods for doing this are rather involved. RESULTS: We propose a simple but rigorous normalized N50 assembly metric that combines N50 with such a correctness measure; assembly is split into as many parts as necessary to align each part to the reference. For scalability, we first compute maximal local approximate matches between scaffolds and reference in distributed manner, and then proceed with co-linear chaining to find a global alignment. Best alignment is removed from the scaffold and the process is iterated with the remaining scaffold content in order to split the scaffold into correctly aligning parts. The proposed normalized N50 metric is then the N50 value computed for the final correctly aligning parts. As a side result of independent interest, we show how to modify co-linear chaining to restrict gaps to produce a more sensible global alignment. CONCLUSIONS: We propose and implement a comprehensive and efficient approach to compute a metric that summarizes scaffold assembly correctness and length. Our implementation can be downloaded from http://www.cs.helsinki.fi/group/scaffold/normalizedN50/.
format Online
Article
Text
id pubmed-3556137
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35561372013-01-31 Normalized N50 assembly metric using gap-restricted co-linear chaining Mäkinen, Veli Salmela, Leena Ylinen, Johannes BMC Bioinformatics Methodology Article BACKGROUND: For the development of genome assembly tools, some comprehensive and efficiently computable validation measures are required to assess the quality of the assembly. The mostly used N50 measure summarizes the assembly results by the length of the scaffold (or contig) overlapping the midpoint of the length-order concatenation of scaffolds (contigs). Especially for scaffold assemblies it is non-trivial to combine a correctness measure to the N50 values, and the current methods for doing this are rather involved. RESULTS: We propose a simple but rigorous normalized N50 assembly metric that combines N50 with such a correctness measure; assembly is split into as many parts as necessary to align each part to the reference. For scalability, we first compute maximal local approximate matches between scaffolds and reference in distributed manner, and then proceed with co-linear chaining to find a global alignment. Best alignment is removed from the scaffold and the process is iterated with the remaining scaffold content in order to split the scaffold into correctly aligning parts. The proposed normalized N50 metric is then the N50 value computed for the final correctly aligning parts. As a side result of independent interest, we show how to modify co-linear chaining to restrict gaps to produce a more sensible global alignment. CONCLUSIONS: We propose and implement a comprehensive and efficient approach to compute a metric that summarizes scaffold assembly correctness and length. Our implementation can be downloaded from http://www.cs.helsinki.fi/group/scaffold/normalizedN50/. BioMed Central 2012-10-03 /pmc/articles/PMC3556137/ /pubmed/23031320 http://dx.doi.org/10.1186/1471-2105-13-255 Text en Copyright ©2012 Mäkinen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Mäkinen, Veli
Salmela, Leena
Ylinen, Johannes
Normalized N50 assembly metric using gap-restricted co-linear chaining
title Normalized N50 assembly metric using gap-restricted co-linear chaining
title_full Normalized N50 assembly metric using gap-restricted co-linear chaining
title_fullStr Normalized N50 assembly metric using gap-restricted co-linear chaining
title_full_unstemmed Normalized N50 assembly metric using gap-restricted co-linear chaining
title_short Normalized N50 assembly metric using gap-restricted co-linear chaining
title_sort normalized n50 assembly metric using gap-restricted co-linear chaining
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3556137/
https://www.ncbi.nlm.nih.gov/pubmed/23031320
http://dx.doi.org/10.1186/1471-2105-13-255
work_keys_str_mv AT makinenveli normalizedn50assemblymetricusinggaprestrictedcolinearchaining
AT salmelaleena normalizedn50assemblymetricusinggaprestrictedcolinearchaining
AT ylinenjohannes normalizedn50assemblymetricusinggaprestrictedcolinearchaining