Cargando…

Benchmarking the topological accuracy of bacterial phylogenomic workflows using in silico evolution

Phylogenetic analyses are widely used in microbiological research, for example to trace the progression of bacterial outbreaks based on whole-genome sequencing data. In practice, multiple analysis steps such as de novo assembly, alignment and phylogenetic inference are combined to form phylogenetic...

Descripción completa

Detalles Bibliográficos
Autores principales: van der Putten, Boas C. L., Huijsmans, Niek A. H., Mende, Daniel R., Schultsz, Constance
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9176278/
https://www.ncbi.nlm.nih.gov/pubmed/35290758
http://dx.doi.org/10.1099/mgen.0.000799
_version_ 1784722630839894016
author van der Putten, Boas C. L.
Huijsmans, Niek A. H.
Mende, Daniel R.
Schultsz, Constance
author_facet van der Putten, Boas C. L.
Huijsmans, Niek A. H.
Mende, Daniel R.
Schultsz, Constance
author_sort van der Putten, Boas C. L.
collection PubMed
description Phylogenetic analyses are widely used in microbiological research, for example to trace the progression of bacterial outbreaks based on whole-genome sequencing data. In practice, multiple analysis steps such as de novo assembly, alignment and phylogenetic inference are combined to form phylogenetic workflows. Comprehensive benchmarking of the accuracy of complete phylogenetic workflows is lacking. To benchmark different phylogenetic workflows, we simulated bacterial evolution under a wide range of evolutionary models, varying the relative rates of substitution, insertion, deletion, gene duplication, gene loss and lateral gene transfer events. The generated datasets corresponded to a genetic diversity usually observed within bacterial species (≥95 % average nucleotide identity). We replicated each simulation three times to assess replicability. In total, we benchmarked 19 distinct phylogenetic workflows using 8 different simulated datasets. We found that recently developed k-mer alignment methods such as kSNP and ska achieve similar accuracy as reference mapping. The high accuracy of k-mer alignment methods can be explained by the large fractions of genomes these methods can align, relative to other approaches. We also found that the choice of de novo assembly algorithm influences the accuracy of phylogenetic reconstruction, with workflows employing SPAdes or skesa outperforming those employing Velvet. Finally, we found that the results of phylogenetic benchmarking are highly variable between replicates. We conclude that for phylogenomic reconstruction, k-mer alignment methods are relevant alternatives to reference mapping at the species level, especially in the absence of suitable reference genomes. We show de novo genome assembly accuracy to be an underappreciated parameter required for accurate phylogenomic reconstruction.
format Online
Article
Text
id pubmed-9176278
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-91762782022-06-09 Benchmarking the topological accuracy of bacterial phylogenomic workflows using in silico evolution van der Putten, Boas C. L. Huijsmans, Niek A. H. Mende, Daniel R. Schultsz, Constance Microb Genom Research Articles Phylogenetic analyses are widely used in microbiological research, for example to trace the progression of bacterial outbreaks based on whole-genome sequencing data. In practice, multiple analysis steps such as de novo assembly, alignment and phylogenetic inference are combined to form phylogenetic workflows. Comprehensive benchmarking of the accuracy of complete phylogenetic workflows is lacking. To benchmark different phylogenetic workflows, we simulated bacterial evolution under a wide range of evolutionary models, varying the relative rates of substitution, insertion, deletion, gene duplication, gene loss and lateral gene transfer events. The generated datasets corresponded to a genetic diversity usually observed within bacterial species (≥95 % average nucleotide identity). We replicated each simulation three times to assess replicability. In total, we benchmarked 19 distinct phylogenetic workflows using 8 different simulated datasets. We found that recently developed k-mer alignment methods such as kSNP and ska achieve similar accuracy as reference mapping. The high accuracy of k-mer alignment methods can be explained by the large fractions of genomes these methods can align, relative to other approaches. We also found that the choice of de novo assembly algorithm influences the accuracy of phylogenetic reconstruction, with workflows employing SPAdes or skesa outperforming those employing Velvet. Finally, we found that the results of phylogenetic benchmarking are highly variable between replicates. We conclude that for phylogenomic reconstruction, k-mer alignment methods are relevant alternatives to reference mapping at the species level, especially in the absence of suitable reference genomes. We show de novo genome assembly accuracy to be an underappreciated parameter required for accurate phylogenomic reconstruction. Microbiology Society 2022-03-15 /pmc/articles/PMC9176278/ /pubmed/35290758 http://dx.doi.org/10.1099/mgen.0.000799 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License.
spellingShingle Research Articles
van der Putten, Boas C. L.
Huijsmans, Niek A. H.
Mende, Daniel R.
Schultsz, Constance
Benchmarking the topological accuracy of bacterial phylogenomic workflows using in silico evolution
title Benchmarking the topological accuracy of bacterial phylogenomic workflows using in silico evolution
title_full Benchmarking the topological accuracy of bacterial phylogenomic workflows using in silico evolution
title_fullStr Benchmarking the topological accuracy of bacterial phylogenomic workflows using in silico evolution
title_full_unstemmed Benchmarking the topological accuracy of bacterial phylogenomic workflows using in silico evolution
title_short Benchmarking the topological accuracy of bacterial phylogenomic workflows using in silico evolution
title_sort benchmarking the topological accuracy of bacterial phylogenomic workflows using in silico evolution
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9176278/
https://www.ncbi.nlm.nih.gov/pubmed/35290758
http://dx.doi.org/10.1099/mgen.0.000799
work_keys_str_mv AT vanderputtenboascl benchmarkingthetopologicalaccuracyofbacterialphylogenomicworkflowsusinginsilicoevolution
AT huijsmansniekah benchmarkingthetopologicalaccuracyofbacterialphylogenomicworkflowsusinginsilicoevolution
AT mendedanielr benchmarkingthetopologicalaccuracyofbacterialphylogenomicworkflowsusinginsilicoevolution
AT schultszconstance benchmarkingthetopologicalaccuracyofbacterialphylogenomicworkflowsusinginsilicoevolution