Cargando…
PanKmer: k-mer-based and reference-free pangenome analysis
SUMMARY: Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex euka...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603592/ https://www.ncbi.nlm.nih.gov/pubmed/37846049 http://dx.doi.org/10.1093/bioinformatics/btad621 |
_version_ | 1785126636286377984 |
---|---|
author | Aylward, Anthony J Petrus, Semar Mamerto, Allen Hartwick, Nolan T Michael, Todd P |
author_facet | Aylward, Anthony J Petrus, Semar Mamerto, Allen Hartwick, Nolan T Michael, Todd P |
author_sort | Aylward, Anthony J |
collection | PubMed |
description | SUMMARY: Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex eukaryotic genomes, limit their scope to identifying structural variants (SVs), or incur bias by relying on a reference genome. Here, we present PanKmer, a toolkit designed for reference-free analysis of pangenome datasets consisting of dozens to thousands of individual genomes. PanKmer decomposes a set of input genomes into a table of observed k-mers and their presence–absence values in each genome. These are stored in an efficient k-mer index data format that encodes SNPs, INDELs, and SVs. It also includes functions for downstream analysis of the k-mer index, such as calculating sequence similarity statistics between individuals at whole-genome or local scales. For example, k-mers can be “anchored” in any individual genome to quantify sequence variability or conservation at a specific locus. This facilitates workflows with various biological applications, e.g. identifying cases of hybridization between plant species. PanKmer provides researchers with a valuable and convenient means to explore the full scope of genetic variation in a population, without reference bias. AVAILABILITY AND IMPLEMENTATION: PanKmer is implemented as a Python package with components written in Rust, released under a BSD license. The source code is available from the Python Package Index (PyPI) at https://pypi.org/project/pankmer/ as well as Gitlab at https://gitlab.com/salk-tm/pankmer. Full documentation is available at https://salk-tm.gitlab.io/pankmer/. |
format | Online Article Text |
id | pubmed-10603592 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-106035922023-10-28 PanKmer: k-mer-based and reference-free pangenome analysis Aylward, Anthony J Petrus, Semar Mamerto, Allen Hartwick, Nolan T Michael, Todd P Bioinformatics Applications Note SUMMARY: Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex eukaryotic genomes, limit their scope to identifying structural variants (SVs), or incur bias by relying on a reference genome. Here, we present PanKmer, a toolkit designed for reference-free analysis of pangenome datasets consisting of dozens to thousands of individual genomes. PanKmer decomposes a set of input genomes into a table of observed k-mers and their presence–absence values in each genome. These are stored in an efficient k-mer index data format that encodes SNPs, INDELs, and SVs. It also includes functions for downstream analysis of the k-mer index, such as calculating sequence similarity statistics between individuals at whole-genome or local scales. For example, k-mers can be “anchored” in any individual genome to quantify sequence variability or conservation at a specific locus. This facilitates workflows with various biological applications, e.g. identifying cases of hybridization between plant species. PanKmer provides researchers with a valuable and convenient means to explore the full scope of genetic variation in a population, without reference bias. AVAILABILITY AND IMPLEMENTATION: PanKmer is implemented as a Python package with components written in Rust, released under a BSD license. The source code is available from the Python Package Index (PyPI) at https://pypi.org/project/pankmer/ as well as Gitlab at https://gitlab.com/salk-tm/pankmer. Full documentation is available at https://salk-tm.gitlab.io/pankmer/. Oxford University Press 2023-10-16 /pmc/articles/PMC10603592/ /pubmed/37846049 http://dx.doi.org/10.1093/bioinformatics/btad621 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Note Aylward, Anthony J Petrus, Semar Mamerto, Allen Hartwick, Nolan T Michael, Todd P PanKmer: k-mer-based and reference-free pangenome analysis |
title | PanKmer: k-mer-based and reference-free pangenome analysis |
title_full | PanKmer: k-mer-based and reference-free pangenome analysis |
title_fullStr | PanKmer: k-mer-based and reference-free pangenome analysis |
title_full_unstemmed | PanKmer: k-mer-based and reference-free pangenome analysis |
title_short | PanKmer: k-mer-based and reference-free pangenome analysis |
title_sort | pankmer: k-mer-based and reference-free pangenome analysis |
topic | Applications Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603592/ https://www.ncbi.nlm.nih.gov/pubmed/37846049 http://dx.doi.org/10.1093/bioinformatics/btad621 |
work_keys_str_mv | AT aylwardanthonyj pankmerkmerbasedandreferencefreepangenomeanalysis AT petrussemar pankmerkmerbasedandreferencefreepangenomeanalysis AT mamertoallen pankmerkmerbasedandreferencefreepangenomeanalysis AT hartwicknolant pankmerkmerbasedandreferencefreepangenomeanalysis AT michaeltoddp pankmerkmerbasedandreferencefreepangenomeanalysis |