Cargando…

SuRankCo: supervised ranking of contigs in de novo assemblies

BACKGROUND: Evaluating the quality and reliability of a de novo assembly and of single contigs in particular is challenging since commonly a ground truth is not readily available and numerous factors may influence results. Currently available procedures provide assembly scores but lack a comparative...

Descripción completa

Detalles Bibliográficos
Autores principales: Kuhring, Mathias, Dabrowski, Piotr Wojtek, Piro, Vitor C., Nitsche, Andreas, Renard, Bernhard Y.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4520199/
https://www.ncbi.nlm.nih.gov/pubmed/26224355
http://dx.doi.org/10.1186/s12859-015-0644-7
Descripción
Sumario:BACKGROUND: Evaluating the quality and reliability of a de novo assembly and of single contigs in particular is challenging since commonly a ground truth is not readily available and numerous factors may influence results. Currently available procedures provide assembly scores but lack a comparative quality ranking of contigs within an assembly. RESULTS: We present SuRankCo, which relies on a machine learning approach to predict quality scores for contigs and to enable the ranking of contigs within an assembly. The result is a sorted contig set which allows selective contig usage in downstream analysis. Benchmarking on datasets with known ground truth shows promising sensitivity and specificity and favorable comparison to existing methodology. CONCLUSIONS: SuRankCo analyzes the reliability of de novo assemblies on the contig level and thereby allows quality control and ranking prior to further downstream and validation experiments. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0644-7) contains supplementary material, which is available to authorized users.