Cargando…

Haploid to diploid alignment for variation calling assessment

MOTIVATION: Variation calling is the process of detecting differences between donor and consensus DNA via high-throughput sequencing read mapping. When evaluating the performance of different variation calling methods, a typical scenario is to simulate artificial (diploid) genomes and sample reads f...

Descripción completa

Detalles Bibliográficos
Autores principales: Mäkinen, Veli, Rahkola, Jani
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3852041/
https://www.ncbi.nlm.nih.gov/pubmed/24564537
http://dx.doi.org/10.1186/1471-2105-14-S15-S13
_version_ 1782294402977234944
author Mäkinen, Veli
Rahkola, Jani
author_facet Mäkinen, Veli
Rahkola, Jani
author_sort Mäkinen, Veli
collection PubMed
description MOTIVATION: Variation calling is the process of detecting differences between donor and consensus DNA via high-throughput sequencing read mapping. When evaluating the performance of different variation calling methods, a typical scenario is to simulate artificial (diploid) genomes and sample reads from those. After variation calling, one can then compute precision and recall statistics. This works reliably on SNPs but on larger indels there is the problem of invariance: a predicted deletion/insertion can differ slightly from the true one, yet both make the same change to the genome. Also exactly correct predictions are rare, especially on larger insertions, so one should consider some notion of approximate predictions for fair comparison. RESULTS: We propose a full genome alignment-based strategy that allows for fair comparison of variation calling predictions: First, we apply the predicted variations to the consensus genome to create as many haploid genomes as are necessary to explain the variations. Second, we align the haploid genomes to the (aligned) artificial diploid genomes allowing arbitrary recombinations. The resulting haploid to diploid alignments tells how much the predictions differ from the true ones, solving the invariance issues in direct variation comparison. In an effort to make the approach scalable to real genomes, we develop a simple variant of the classical edit distance dynamic programming algorithm and apply the diagonal doubling technique to optimise the computation. We experiment with the approach on simulated predictions and also on real prediction data from a variation calling challenge.
format Online
Article
Text
id pubmed-3852041
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38520412013-12-20 Haploid to diploid alignment for variation calling assessment Mäkinen, Veli Rahkola, Jani BMC Bioinformatics Proceedings MOTIVATION: Variation calling is the process of detecting differences between donor and consensus DNA via high-throughput sequencing read mapping. When evaluating the performance of different variation calling methods, a typical scenario is to simulate artificial (diploid) genomes and sample reads from those. After variation calling, one can then compute precision and recall statistics. This works reliably on SNPs but on larger indels there is the problem of invariance: a predicted deletion/insertion can differ slightly from the true one, yet both make the same change to the genome. Also exactly correct predictions are rare, especially on larger insertions, so one should consider some notion of approximate predictions for fair comparison. RESULTS: We propose a full genome alignment-based strategy that allows for fair comparison of variation calling predictions: First, we apply the predicted variations to the consensus genome to create as many haploid genomes as are necessary to explain the variations. Second, we align the haploid genomes to the (aligned) artificial diploid genomes allowing arbitrary recombinations. The resulting haploid to diploid alignments tells how much the predictions differ from the true ones, solving the invariance issues in direct variation comparison. In an effort to make the approach scalable to real genomes, we develop a simple variant of the classical edit distance dynamic programming algorithm and apply the diagonal doubling technique to optimise the computation. We experiment with the approach on simulated predictions and also on real prediction data from a variation calling challenge. BioMed Central 2013-10-15 /pmc/articles/PMC3852041/ /pubmed/24564537 http://dx.doi.org/10.1186/1471-2105-14-S15-S13 Text en Copyright © 2013 Veli and Rahkola; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Mäkinen, Veli
Rahkola, Jani
Haploid to diploid alignment for variation calling assessment
title Haploid to diploid alignment for variation calling assessment
title_full Haploid to diploid alignment for variation calling assessment
title_fullStr Haploid to diploid alignment for variation calling assessment
title_full_unstemmed Haploid to diploid alignment for variation calling assessment
title_short Haploid to diploid alignment for variation calling assessment
title_sort haploid to diploid alignment for variation calling assessment
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3852041/
https://www.ncbi.nlm.nih.gov/pubmed/24564537
http://dx.doi.org/10.1186/1471-2105-14-S15-S13
work_keys_str_mv AT makinenveli haploidtodiploidalignmentforvariationcallingassessment
AT rahkolajani haploidtodiploidalignmentforvariationcallingassessment