Cargando…

Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation

In less than a decade, population genomics of microbes has progressed from the effort of sequencing dozens of strains to thousands, or even tens of thousands of strains in a single study. There are now hundreds of thousands of genomes available even for a single bacterial species, and the number of...

Descripción completa

Detalles Bibliográficos
Autores principales: Lees, John A., Tonkin-Hill, Gerry, Yang, Zhirong, Corander, Jukka
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9393562/
https://www.ncbi.nlm.nih.gov/pubmed/35989601
http://dx.doi.org/10.1098/rstb.2021.0237
_version_ 1784771296463159296
author Lees, John A.
Tonkin-Hill, Gerry
Yang, Zhirong
Corander, Jukka
author_facet Lees, John A.
Tonkin-Hill, Gerry
Yang, Zhirong
Corander, Jukka
author_sort Lees, John A.
collection PubMed
description In less than a decade, population genomics of microbes has progressed from the effort of sequencing dozens of strains to thousands, or even tens of thousands of strains in a single study. There are now hundreds of thousands of genomes available even for a single bacterial species, and the number of genomes is expected to continue to increase at an accelerated pace given the advances in sequencing technology and widespread genomic surveillance initiatives. This explosion of data calls for innovative methods to enable rapid exploration of the structure of a population based on different data modalities, such as multiple sequence alignments, assemblies and estimates of gene content across different genomes. Here, we present Mandrake, an efficient implementation of a dimensional reduction method tailored for the needs of large-scale population genomics. Mandrake is capable of visualizing population structure from millions of whole genomes, and we illustrate its usefulness with several datasets representing major pathogens. Our method is freely available both as an analysis pipeline (https://github.com/johnlees/mandrake) and as a browser-based interactive application (https://gtonkinhill.github.io/mandrake-web/). This article is part of a discussion meeting issue ‘Genomic population structures of microbial pathogens’.
format Online
Article
Text
id pubmed-9393562
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher The Royal Society
record_format MEDLINE/PubMed
spelling pubmed-93935622022-08-30 Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation Lees, John A. Tonkin-Hill, Gerry Yang, Zhirong Corander, Jukka Philos Trans R Soc Lond B Biol Sci Articles In less than a decade, population genomics of microbes has progressed from the effort of sequencing dozens of strains to thousands, or even tens of thousands of strains in a single study. There are now hundreds of thousands of genomes available even for a single bacterial species, and the number of genomes is expected to continue to increase at an accelerated pace given the advances in sequencing technology and widespread genomic surveillance initiatives. This explosion of data calls for innovative methods to enable rapid exploration of the structure of a population based on different data modalities, such as multiple sequence alignments, assemblies and estimates of gene content across different genomes. Here, we present Mandrake, an efficient implementation of a dimensional reduction method tailored for the needs of large-scale population genomics. Mandrake is capable of visualizing population structure from millions of whole genomes, and we illustrate its usefulness with several datasets representing major pathogens. Our method is freely available both as an analysis pipeline (https://github.com/johnlees/mandrake) and as a browser-based interactive application (https://gtonkinhill.github.io/mandrake-web/). This article is part of a discussion meeting issue ‘Genomic population structures of microbial pathogens’. The Royal Society 2022-10-10 2022-08-22 /pmc/articles/PMC9393562/ /pubmed/35989601 http://dx.doi.org/10.1098/rstb.2021.0237 Text en © 2022 The Authors. https://creativecommons.org/licenses/by/4.0/Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, provided the original author and source are credited.
spellingShingle Articles
Lees, John A.
Tonkin-Hill, Gerry
Yang, Zhirong
Corander, Jukka
Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation
title Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation
title_full Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation
title_fullStr Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation
title_full_unstemmed Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation
title_short Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation
title_sort mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9393562/
https://www.ncbi.nlm.nih.gov/pubmed/35989601
http://dx.doi.org/10.1098/rstb.2021.0237
work_keys_str_mv AT leesjohna mandrakevisualizingmicrobialpopulationstructurebyembeddingmillionsofgenomesintoalowdimensionalrepresentation
AT tonkinhillgerry mandrakevisualizingmicrobialpopulationstructurebyembeddingmillionsofgenomesintoalowdimensionalrepresentation
AT yangzhirong mandrakevisualizingmicrobialpopulationstructurebyembeddingmillionsofgenomesintoalowdimensionalrepresentation
AT coranderjukka mandrakevisualizingmicrobialpopulationstructurebyembeddingmillionsofgenomesintoalowdimensionalrepresentation