Cargando…
Towards pan-genome read alignment to improve variation calling
BACKGROUND: Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity,...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5954285/ https://www.ncbi.nlm.nih.gov/pubmed/29764365 http://dx.doi.org/10.1186/s12864-018-4465-8 |
_version_ | 1783323490824224768 |
---|---|
author | Valenzuela, Daniel Norri, Tuukka Välimäki, Niko Pitkänen, Esa Mäkinen, Veli |
author_facet | Valenzuela, Daniel Norri, Tuukka Välimäki, Niko Pitkänen, Esa Mäkinen, Veli |
author_sort | Valenzuela, Daniel |
collection | PubMed |
description | BACKGROUND: Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity, resequencing data workflows are still based on a single human reference genome. Identification and genotyping of genetic variants is typically carried out on short-read data aligned to a single reference, disregarding the underlying variation. RESULTS: We propose a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation – a pan-genomic reference. We provide a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows. Our tool is open source and available online: https://gitlab.com/dvalenzu/PanVC. CONCLUSIONS: Our experiments show that by replacing a standard human reference with a pan-genomic one we achieve an improvement in single-nucleotide variant calling accuracy and in short indel calling accuracy over the widely adopted Genome Analysis Toolkit (GATK) in difficult genomic regions. |
format | Online Article Text |
id | pubmed-5954285 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-59542852018-05-21 Towards pan-genome read alignment to improve variation calling Valenzuela, Daniel Norri, Tuukka Välimäki, Niko Pitkänen, Esa Mäkinen, Veli BMC Genomics Research BACKGROUND: Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity, resequencing data workflows are still based on a single human reference genome. Identification and genotyping of genetic variants is typically carried out on short-read data aligned to a single reference, disregarding the underlying variation. RESULTS: We propose a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation – a pan-genomic reference. We provide a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows. Our tool is open source and available online: https://gitlab.com/dvalenzu/PanVC. CONCLUSIONS: Our experiments show that by replacing a standard human reference with a pan-genomic one we achieve an improvement in single-nucleotide variant calling accuracy and in short indel calling accuracy over the widely adopted Genome Analysis Toolkit (GATK) in difficult genomic regions. BioMed Central 2018-05-09 /pmc/articles/PMC5954285/ /pubmed/29764365 http://dx.doi.org/10.1186/s12864-018-4465-8 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Valenzuela, Daniel Norri, Tuukka Välimäki, Niko Pitkänen, Esa Mäkinen, Veli Towards pan-genome read alignment to improve variation calling |
title | Towards pan-genome read alignment to improve variation calling |
title_full | Towards pan-genome read alignment to improve variation calling |
title_fullStr | Towards pan-genome read alignment to improve variation calling |
title_full_unstemmed | Towards pan-genome read alignment to improve variation calling |
title_short | Towards pan-genome read alignment to improve variation calling |
title_sort | towards pan-genome read alignment to improve variation calling |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5954285/ https://www.ncbi.nlm.nih.gov/pubmed/29764365 http://dx.doi.org/10.1186/s12864-018-4465-8 |
work_keys_str_mv | AT valenzueladaniel towardspangenomereadalignmenttoimprovevariationcalling AT norrituukka towardspangenomereadalignmenttoimprovevariationcalling AT valimakiniko towardspangenomereadalignmenttoimprovevariationcalling AT pitkanenesa towardspangenomereadalignmenttoimprovevariationcalling AT makinenveli towardspangenomereadalignmenttoimprovevariationcalling |