Cargando…

TreeToReads - a pipeline for simulating raw reads from phylogenies

BACKGROUND: Using phylogenomic analysis tools for tracking pathogens has become standard practice in academia, public health agencies, and large industries. Using the same raw read genomic data as input, there are several different approaches being used to infer phylogenetic tree. These include many...

Descripción completa

Detalles Bibliográficos
Autores principales: McTavish, Emily Jane, Pettengill, James, Davis, Steven, Rand, Hugh, Strain, Errol, Allard, Marc, Timme, Ruth E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5359950/
https://www.ncbi.nlm.nih.gov/pubmed/28320310
http://dx.doi.org/10.1186/s12859-017-1592-1
_version_ 1782516492254838784
author McTavish, Emily Jane
Pettengill, James
Davis, Steven
Rand, Hugh
Strain, Errol
Allard, Marc
Timme, Ruth E.
author_facet McTavish, Emily Jane
Pettengill, James
Davis, Steven
Rand, Hugh
Strain, Errol
Allard, Marc
Timme, Ruth E.
author_sort McTavish, Emily Jane
collection PubMed
description BACKGROUND: Using phylogenomic analysis tools for tracking pathogens has become standard practice in academia, public health agencies, and large industries. Using the same raw read genomic data as input, there are several different approaches being used to infer phylogenetic tree. These include many different SNP pipelines, wgMLST approaches, k-mer algorithms, whole genome alignment and others; each of these has advantages and disadvantages, some have been extensively validated, some are faster, some have higher resolution. A few of these analysis approaches are well-integrated into the regulatory process of US Federal agencies (e.g. the FDA’s SNP pipeline for tracking foodborne pathogens). However, despite extensive validation on benchmark datasets and comparison with other pipelines, we lack methods for fully exploring the effects of multiple parameter values in each pipeline that can potentially have an effect on whether the correct phylogenetic tree is recovered. RESULTS: To resolve this problem, we offer a program, TreeToReads, which can generate raw read data from mutated genomes simulated under a known phylogeny. This simulation pipeline allows direct comparisons of simulated and observed data in a controlled environment. At each step of these simulations, researchers can vary parameters of interest (e.g., input tree topology, amount of sequence divergence, rate of indels, read coverage, distance of reference genome, etc) to assess the effects of various parameter values on correctly calling SNPs and reconstructing an accurate tree. CONCLUSIONS: Such critical assessments of the accuracy and robustness of analytical pipelines are essential to progress in both research and applied settings.
format Online
Article
Text
id pubmed-5359950
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53599502017-03-22 TreeToReads - a pipeline for simulating raw reads from phylogenies McTavish, Emily Jane Pettengill, James Davis, Steven Rand, Hugh Strain, Errol Allard, Marc Timme, Ruth E. BMC Bioinformatics Software BACKGROUND: Using phylogenomic analysis tools for tracking pathogens has become standard practice in academia, public health agencies, and large industries. Using the same raw read genomic data as input, there are several different approaches being used to infer phylogenetic tree. These include many different SNP pipelines, wgMLST approaches, k-mer algorithms, whole genome alignment and others; each of these has advantages and disadvantages, some have been extensively validated, some are faster, some have higher resolution. A few of these analysis approaches are well-integrated into the regulatory process of US Federal agencies (e.g. the FDA’s SNP pipeline for tracking foodborne pathogens). However, despite extensive validation on benchmark datasets and comparison with other pipelines, we lack methods for fully exploring the effects of multiple parameter values in each pipeline that can potentially have an effect on whether the correct phylogenetic tree is recovered. RESULTS: To resolve this problem, we offer a program, TreeToReads, which can generate raw read data from mutated genomes simulated under a known phylogeny. This simulation pipeline allows direct comparisons of simulated and observed data in a controlled environment. At each step of these simulations, researchers can vary parameters of interest (e.g., input tree topology, amount of sequence divergence, rate of indels, read coverage, distance of reference genome, etc) to assess the effects of various parameter values on correctly calling SNPs and reconstructing an accurate tree. CONCLUSIONS: Such critical assessments of the accuracy and robustness of analytical pipelines are essential to progress in both research and applied settings. BioMed Central 2017-03-20 /pmc/articles/PMC5359950/ /pubmed/28320310 http://dx.doi.org/10.1186/s12859-017-1592-1 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
McTavish, Emily Jane
Pettengill, James
Davis, Steven
Rand, Hugh
Strain, Errol
Allard, Marc
Timme, Ruth E.
TreeToReads - a pipeline for simulating raw reads from phylogenies
title TreeToReads - a pipeline for simulating raw reads from phylogenies
title_full TreeToReads - a pipeline for simulating raw reads from phylogenies
title_fullStr TreeToReads - a pipeline for simulating raw reads from phylogenies
title_full_unstemmed TreeToReads - a pipeline for simulating raw reads from phylogenies
title_short TreeToReads - a pipeline for simulating raw reads from phylogenies
title_sort treetoreads - a pipeline for simulating raw reads from phylogenies
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5359950/
https://www.ncbi.nlm.nih.gov/pubmed/28320310
http://dx.doi.org/10.1186/s12859-017-1592-1
work_keys_str_mv AT mctavishemilyjane treetoreadsapipelineforsimulatingrawreadsfromphylogenies
AT pettengilljames treetoreadsapipelineforsimulatingrawreadsfromphylogenies
AT davissteven treetoreadsapipelineforsimulatingrawreadsfromphylogenies
AT randhugh treetoreadsapipelineforsimulatingrawreadsfromphylogenies
AT strainerrol treetoreadsapipelineforsimulatingrawreadsfromphylogenies
AT allardmarc treetoreadsapipelineforsimulatingrawreadsfromphylogenies
AT timmeruthe treetoreadsapipelineforsimulatingrawreadsfromphylogenies