Cargando…
Normalized N50 assembly metric using gap-restricted co-linear chaining
BACKGROUND: For the development of genome assembly tools, some comprehensive and efficiently computable validation measures are required to assess the quality of the assembly. The mostly used N50 measure summarizes the assembly results by the length of the scaffold (or contig) overlapping the midpoi...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3556137/ https://www.ncbi.nlm.nih.gov/pubmed/23031320 http://dx.doi.org/10.1186/1471-2105-13-255 |
_version_ | 1782257153435762688 |
---|---|
author | Mäkinen, Veli Salmela, Leena Ylinen, Johannes |
author_facet | Mäkinen, Veli Salmela, Leena Ylinen, Johannes |
author_sort | Mäkinen, Veli |
collection | PubMed |
description | BACKGROUND: For the development of genome assembly tools, some comprehensive and efficiently computable validation measures are required to assess the quality of the assembly. The mostly used N50 measure summarizes the assembly results by the length of the scaffold (or contig) overlapping the midpoint of the length-order concatenation of scaffolds (contigs). Especially for scaffold assemblies it is non-trivial to combine a correctness measure to the N50 values, and the current methods for doing this are rather involved. RESULTS: We propose a simple but rigorous normalized N50 assembly metric that combines N50 with such a correctness measure; assembly is split into as many parts as necessary to align each part to the reference. For scalability, we first compute maximal local approximate matches between scaffolds and reference in distributed manner, and then proceed with co-linear chaining to find a global alignment. Best alignment is removed from the scaffold and the process is iterated with the remaining scaffold content in order to split the scaffold into correctly aligning parts. The proposed normalized N50 metric is then the N50 value computed for the final correctly aligning parts. As a side result of independent interest, we show how to modify co-linear chaining to restrict gaps to produce a more sensible global alignment. CONCLUSIONS: We propose and implement a comprehensive and efficient approach to compute a metric that summarizes scaffold assembly correctness and length. Our implementation can be downloaded from http://www.cs.helsinki.fi/group/scaffold/normalizedN50/. |
format | Online Article Text |
id | pubmed-3556137 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35561372013-01-31 Normalized N50 assembly metric using gap-restricted co-linear chaining Mäkinen, Veli Salmela, Leena Ylinen, Johannes BMC Bioinformatics Methodology Article BACKGROUND: For the development of genome assembly tools, some comprehensive and efficiently computable validation measures are required to assess the quality of the assembly. The mostly used N50 measure summarizes the assembly results by the length of the scaffold (or contig) overlapping the midpoint of the length-order concatenation of scaffolds (contigs). Especially for scaffold assemblies it is non-trivial to combine a correctness measure to the N50 values, and the current methods for doing this are rather involved. RESULTS: We propose a simple but rigorous normalized N50 assembly metric that combines N50 with such a correctness measure; assembly is split into as many parts as necessary to align each part to the reference. For scalability, we first compute maximal local approximate matches between scaffolds and reference in distributed manner, and then proceed with co-linear chaining to find a global alignment. Best alignment is removed from the scaffold and the process is iterated with the remaining scaffold content in order to split the scaffold into correctly aligning parts. The proposed normalized N50 metric is then the N50 value computed for the final correctly aligning parts. As a side result of independent interest, we show how to modify co-linear chaining to restrict gaps to produce a more sensible global alignment. CONCLUSIONS: We propose and implement a comprehensive and efficient approach to compute a metric that summarizes scaffold assembly correctness and length. Our implementation can be downloaded from http://www.cs.helsinki.fi/group/scaffold/normalizedN50/. BioMed Central 2012-10-03 /pmc/articles/PMC3556137/ /pubmed/23031320 http://dx.doi.org/10.1186/1471-2105-13-255 Text en Copyright ©2012 Mäkinen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Mäkinen, Veli Salmela, Leena Ylinen, Johannes Normalized N50 assembly metric using gap-restricted co-linear chaining |
title | Normalized N50 assembly metric using gap-restricted co-linear chaining |
title_full | Normalized N50 assembly metric using gap-restricted co-linear chaining |
title_fullStr | Normalized N50 assembly metric using gap-restricted co-linear chaining |
title_full_unstemmed | Normalized N50 assembly metric using gap-restricted co-linear chaining |
title_short | Normalized N50 assembly metric using gap-restricted co-linear chaining |
title_sort | normalized n50 assembly metric using gap-restricted co-linear chaining |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3556137/ https://www.ncbi.nlm.nih.gov/pubmed/23031320 http://dx.doi.org/10.1186/1471-2105-13-255 |
work_keys_str_mv | AT makinenveli normalizedn50assemblymetricusinggaprestrictedcolinearchaining AT salmelaleena normalizedn50assemblymetricusinggaprestrictedcolinearchaining AT ylinenjohannes normalizedn50assemblymetricusinggaprestrictedcolinearchaining |