Cargando…

Founder reconstruction enables scalable and seamless pangenomic analysis

MOTIVATION: Variant calling workflows that utilize a single reference sequence are the de facto standard elementary genomic analysis routine for resequencing projects. Various ways to enhance the reference with pangenomic information have been proposed, but scalability combined with seamless integra...

Descripción completa

Detalles Bibliográficos
Autores principales: Norri, Tuukka, Cazaux, Bastien, Dönges, Saska, Valenzuela, Daniel, Mäkinen, Veli
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8665761/
https://www.ncbi.nlm.nih.gov/pubmed/34260702
http://dx.doi.org/10.1093/bioinformatics/btab516
_version_ 1784614075756445696
author Norri, Tuukka
Cazaux, Bastien
Dönges, Saska
Valenzuela, Daniel
Mäkinen, Veli
author_facet Norri, Tuukka
Cazaux, Bastien
Dönges, Saska
Valenzuela, Daniel
Mäkinen, Veli
author_sort Norri, Tuukka
collection PubMed
description MOTIVATION: Variant calling workflows that utilize a single reference sequence are the de facto standard elementary genomic analysis routine for resequencing projects. Various ways to enhance the reference with pangenomic information have been proposed, but scalability combined with seamless integration to existing workflows remains a challenge. RESULTS: We present PanVC with founder sequences, a scalable and accurate variant calling workflow based on a multiple alignment of reference sequences. Scalability is achieved by removing duplicate parts up to a limit into a founder multiple alignment, that is then indexed using a hybrid scheme that exploits general purpose read aligners. Our implemented workflow uses GATK or BCFtools for variant calling, but the various steps of our workflow (e.g. vcf2multialign tool, founder reconstruction) can be of independent interest as a basis for creating novel pangenome analysis workflows beyond variant calling. AVAILABILITY AND IMPLEMENTATION: Our open access tools and instructions how to reproduce our experiments are available at the following address: https://github.com/algbio/panvc-founders. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8665761
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-86657612021-12-13 Founder reconstruction enables scalable and seamless pangenomic analysis Norri, Tuukka Cazaux, Bastien Dönges, Saska Valenzuela, Daniel Mäkinen, Veli Bioinformatics Original Papers MOTIVATION: Variant calling workflows that utilize a single reference sequence are the de facto standard elementary genomic analysis routine for resequencing projects. Various ways to enhance the reference with pangenomic information have been proposed, but scalability combined with seamless integration to existing workflows remains a challenge. RESULTS: We present PanVC with founder sequences, a scalable and accurate variant calling workflow based on a multiple alignment of reference sequences. Scalability is achieved by removing duplicate parts up to a limit into a founder multiple alignment, that is then indexed using a hybrid scheme that exploits general purpose read aligners. Our implemented workflow uses GATK or BCFtools for variant calling, but the various steps of our workflow (e.g. vcf2multialign tool, founder reconstruction) can be of independent interest as a basis for creating novel pangenome analysis workflows beyond variant calling. AVAILABILITY AND IMPLEMENTATION: Our open access tools and instructions how to reproduce our experiments are available at the following address: https://github.com/algbio/panvc-founders. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-07-14 /pmc/articles/PMC8665761/ /pubmed/34260702 http://dx.doi.org/10.1093/bioinformatics/btab516 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Norri, Tuukka
Cazaux, Bastien
Dönges, Saska
Valenzuela, Daniel
Mäkinen, Veli
Founder reconstruction enables scalable and seamless pangenomic analysis
title Founder reconstruction enables scalable and seamless pangenomic analysis
title_full Founder reconstruction enables scalable and seamless pangenomic analysis
title_fullStr Founder reconstruction enables scalable and seamless pangenomic analysis
title_full_unstemmed Founder reconstruction enables scalable and seamless pangenomic analysis
title_short Founder reconstruction enables scalable and seamless pangenomic analysis
title_sort founder reconstruction enables scalable and seamless pangenomic analysis
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8665761/
https://www.ncbi.nlm.nih.gov/pubmed/34260702
http://dx.doi.org/10.1093/bioinformatics/btab516
work_keys_str_mv AT norrituukka founderreconstructionenablesscalableandseamlesspangenomicanalysis
AT cazauxbastien founderreconstructionenablesscalableandseamlesspangenomicanalysis
AT dongessaska founderreconstructionenablesscalableandseamlesspangenomicanalysis
AT valenzueladaniel founderreconstructionenablesscalableandseamlesspangenomicanalysis
AT makinenveli founderreconstructionenablesscalableandseamlesspangenomicanalysis