Cargando…

PanKmer: k-mer-based and reference-free pangenome analysis

SUMMARY: Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex euka...

Descripción completa

Detalles Bibliográficos
Autores principales: Aylward, Anthony J, Petrus, Semar, Mamerto, Allen, Hartwick, Nolan T, Michael, Todd P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603592/
https://www.ncbi.nlm.nih.gov/pubmed/37846049
http://dx.doi.org/10.1093/bioinformatics/btad621
_version_ 1785126636286377984
author Aylward, Anthony J
Petrus, Semar
Mamerto, Allen
Hartwick, Nolan T
Michael, Todd P
author_facet Aylward, Anthony J
Petrus, Semar
Mamerto, Allen
Hartwick, Nolan T
Michael, Todd P
author_sort Aylward, Anthony J
collection PubMed
description SUMMARY: Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex eukaryotic genomes, limit their scope to identifying structural variants (SVs), or incur bias by relying on a reference genome. Here, we present PanKmer, a toolkit designed for reference-free analysis of pangenome datasets consisting of dozens to thousands of individual genomes. PanKmer decomposes a set of input genomes into a table of observed k-mers and their presence–absence values in each genome. These are stored in an efficient k-mer index data format that encodes SNPs, INDELs, and SVs. It also includes functions for downstream analysis of the k-mer index, such as calculating sequence similarity statistics between individuals at whole-genome or local scales. For example, k-mers can be “anchored” in any individual genome to quantify sequence variability or conservation at a specific locus. This facilitates workflows with various biological applications, e.g. identifying cases of hybridization between plant species. PanKmer provides researchers with a valuable and convenient means to explore the full scope of genetic variation in a population, without reference bias. AVAILABILITY AND IMPLEMENTATION: PanKmer is implemented as a Python package with components written in Rust, released under a BSD license. The source code is available from the Python Package Index (PyPI) at https://pypi.org/project/pankmer/ as well as Gitlab at https://gitlab.com/salk-tm/pankmer. Full documentation is available at https://salk-tm.gitlab.io/pankmer/.
format Online
Article
Text
id pubmed-10603592
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106035922023-10-28 PanKmer: k-mer-based and reference-free pangenome analysis Aylward, Anthony J Petrus, Semar Mamerto, Allen Hartwick, Nolan T Michael, Todd P Bioinformatics Applications Note SUMMARY: Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex eukaryotic genomes, limit their scope to identifying structural variants (SVs), or incur bias by relying on a reference genome. Here, we present PanKmer, a toolkit designed for reference-free analysis of pangenome datasets consisting of dozens to thousands of individual genomes. PanKmer decomposes a set of input genomes into a table of observed k-mers and their presence–absence values in each genome. These are stored in an efficient k-mer index data format that encodes SNPs, INDELs, and SVs. It also includes functions for downstream analysis of the k-mer index, such as calculating sequence similarity statistics between individuals at whole-genome or local scales. For example, k-mers can be “anchored” in any individual genome to quantify sequence variability or conservation at a specific locus. This facilitates workflows with various biological applications, e.g. identifying cases of hybridization between plant species. PanKmer provides researchers with a valuable and convenient means to explore the full scope of genetic variation in a population, without reference bias. AVAILABILITY AND IMPLEMENTATION: PanKmer is implemented as a Python package with components written in Rust, released under a BSD license. The source code is available from the Python Package Index (PyPI) at https://pypi.org/project/pankmer/ as well as Gitlab at https://gitlab.com/salk-tm/pankmer. Full documentation is available at https://salk-tm.gitlab.io/pankmer/. Oxford University Press 2023-10-16 /pmc/articles/PMC10603592/ /pubmed/37846049 http://dx.doi.org/10.1093/bioinformatics/btad621 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Aylward, Anthony J
Petrus, Semar
Mamerto, Allen
Hartwick, Nolan T
Michael, Todd P
PanKmer: k-mer-based and reference-free pangenome analysis
title PanKmer: k-mer-based and reference-free pangenome analysis
title_full PanKmer: k-mer-based and reference-free pangenome analysis
title_fullStr PanKmer: k-mer-based and reference-free pangenome analysis
title_full_unstemmed PanKmer: k-mer-based and reference-free pangenome analysis
title_short PanKmer: k-mer-based and reference-free pangenome analysis
title_sort pankmer: k-mer-based and reference-free pangenome analysis
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603592/
https://www.ncbi.nlm.nih.gov/pubmed/37846049
http://dx.doi.org/10.1093/bioinformatics/btad621
work_keys_str_mv AT aylwardanthonyj pankmerkmerbasedandreferencefreepangenomeanalysis
AT petrussemar pankmerkmerbasedandreferencefreepangenomeanalysis
AT mamertoallen pankmerkmerbasedandreferencefreepangenomeanalysis
AT hartwicknolant pankmerkmerbasedandreferencefreepangenomeanalysis
AT michaeltoddp pankmerkmerbasedandreferencefreepangenomeanalysis