Cargando…

PanKmer: k-mer-based and reference-free pangenome analysis

SUMMARY: Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex euka...

Descripción completa

Detalles Bibliográficos
Autores principales:	Aylward, Anthony J, Petrus, Semar, Mamerto, Allen, Hartwick, Nolan T, Michael, Todd P
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Applications Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603592/ https://www.ncbi.nlm.nih.gov/pubmed/37846049 http://dx.doi.org/10.1093/bioinformatics/btad621

_version_	1785126636286377984
author	Aylward, Anthony J Petrus, Semar Mamerto, Allen Hartwick, Nolan T Michael, Todd P
author_facet	Aylward, Anthony J Petrus, Semar Mamerto, Allen Hartwick, Nolan T Michael, Todd P
author_sort	Aylward, Anthony J
collection	PubMed
description	SUMMARY: Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex eukaryotic genomes, limit their scope to identifying structural variants (SVs), or incur bias by relying on a reference genome. Here, we present PanKmer, a toolkit designed for reference-free analysis of pangenome datasets consisting of dozens to thousands of individual genomes. PanKmer decomposes a set of input genomes into a table of observed k-mers and their presence–absence values in each genome. These are stored in an efficient k-mer index data format that encodes SNPs, INDELs, and SVs. It also includes functions for downstream analysis of the k-mer index, such as calculating sequence similarity statistics between individuals at whole-genome or local scales. For example, k-mers can be “anchored” in any individual genome to quantify sequence variability or conservation at a specific locus. This facilitates workflows with various biological applications, e.g. identifying cases of hybridization between plant species. PanKmer provides researchers with a valuable and convenient means to explore the full scope of genetic variation in a population, without reference bias. AVAILABILITY AND IMPLEMENTATION: PanKmer is implemented as a Python package with components written in Rust, released under a BSD license. The source code is available from the Python Package Index (PyPI) at https://pypi.org/project/pankmer/ as well as Gitlab at https://gitlab.com/salk-tm/pankmer. Full documentation is available at https://salk-tm.gitlab.io/pankmer/.
format	Online Article Text
id	pubmed-10603592
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-106035922023-10-28 PanKmer: k-mer-based and reference-free pangenome analysis Aylward, Anthony J Petrus, Semar Mamerto, Allen Hartwick, Nolan T Michael, Todd P Bioinformatics Applications Note SUMMARY: Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex eukaryotic genomes, limit their scope to identifying structural variants (SVs), or incur bias by relying on a reference genome. Here, we present PanKmer, a toolkit designed for reference-free analysis of pangenome datasets consisting of dozens to thousands of individual genomes. PanKmer decomposes a set of input genomes into a table of observed k-mers and their presence–absence values in each genome. These are stored in an efficient k-mer index data format that encodes SNPs, INDELs, and SVs. It also includes functions for downstream analysis of the k-mer index, such as calculating sequence similarity statistics between individuals at whole-genome or local scales. For example, k-mers can be “anchored” in any individual genome to quantify sequence variability or conservation at a specific locus. This facilitates workflows with various biological applications, e.g. identifying cases of hybridization between plant species. PanKmer provides researchers with a valuable and convenient means to explore the full scope of genetic variation in a population, without reference bias. AVAILABILITY AND IMPLEMENTATION: PanKmer is implemented as a Python package with components written in Rust, released under a BSD license. The source code is available from the Python Package Index (PyPI) at https://pypi.org/project/pankmer/ as well as Gitlab at https://gitlab.com/salk-tm/pankmer. Full documentation is available at https://salk-tm.gitlab.io/pankmer/. Oxford University Press 2023-10-16 /pmc/articles/PMC10603592/ /pubmed/37846049 http://dx.doi.org/10.1093/bioinformatics/btad621 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Applications Note Aylward, Anthony J Petrus, Semar Mamerto, Allen Hartwick, Nolan T Michael, Todd P PanKmer: k-mer-based and reference-free pangenome analysis
title	PanKmer: k-mer-based and reference-free pangenome analysis
title_full	PanKmer: k-mer-based and reference-free pangenome analysis
title_fullStr	PanKmer: k-mer-based and reference-free pangenome analysis
title_full_unstemmed	PanKmer: k-mer-based and reference-free pangenome analysis
title_short	PanKmer: k-mer-based and reference-free pangenome analysis
title_sort	pankmer: k-mer-based and reference-free pangenome analysis
topic	Applications Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603592/ https://www.ncbi.nlm.nih.gov/pubmed/37846049 http://dx.doi.org/10.1093/bioinformatics/btad621
work_keys_str_mv	AT aylwardanthonyj pankmerkmerbasedandreferencefreepangenomeanalysis AT petrussemar pankmerkmerbasedandreferencefreepangenomeanalysis AT mamertoallen pankmerkmerbasedandreferencefreepangenomeanalysis AT hartwicknolant pankmerkmerbasedandreferencefreepangenomeanalysis AT michaeltoddp pankmerkmerbasedandreferencefreepangenomeanalysis

PanKmer: k-mer-based and reference-free pangenome analysis

Ejemplares similares