Cargando…

Evaluation of SNP calling methods for closely related bacterial isolates and a novel high-accuracy pipeline: BactSNP

Bacteria are highly diverse, even within a species; thus, there have been many studies which classify a single species into multiple types and analyze the genetic differences between them. Recently, the use of whole-genome sequencing (WGS) has been popular for these analyses, and the identification...

Descripción completa

Detalles Bibliográficos
Autores principales: Yoshimura, Dai, Kajitani, Rei, Gotoh, Yasuhiro, Katahira, Katsuyuki, Okuno, Miki, Ogura, Yoshitoshi, Hayashi, Tetsuya, Itoh, Takehiko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6562250/
https://www.ncbi.nlm.nih.gov/pubmed/31099741
http://dx.doi.org/10.1099/mgen.0.000261
_version_ 1783426259718504448
author Yoshimura, Dai
Kajitani, Rei
Gotoh, Yasuhiro
Katahira, Katsuyuki
Okuno, Miki
Ogura, Yoshitoshi
Hayashi, Tetsuya
Itoh, Takehiko
author_facet Yoshimura, Dai
Kajitani, Rei
Gotoh, Yasuhiro
Katahira, Katsuyuki
Okuno, Miki
Ogura, Yoshitoshi
Hayashi, Tetsuya
Itoh, Takehiko
author_sort Yoshimura, Dai
collection PubMed
description Bacteria are highly diverse, even within a species; thus, there have been many studies which classify a single species into multiple types and analyze the genetic differences between them. Recently, the use of whole-genome sequencing (WGS) has been popular for these analyses, and the identification of single-nucleotide polymorphisms (SNPs) between isolates is the most basic analysis performed following WGS. The performance of SNP-calling methods therefore has a significant effect on the accuracy of downstream analyses, such as phylogenetic tree inference. In particular, when closely related isolates are analyzed, e.g. in outbreak investigations, some SNP callers tend to detect a high number of false-positive SNPs compared with the limited number of true SNPs among isolates. However, the performances of various SNP callers in such a situation have not been validated sufficiently. Here, we show the results of realistic benchmarks of commonly used SNP callers, revealing that some of them exhibit markedly low accuracy when target isolates are closely related. As an alternative, we developed a novel pipeline BactSNP, which utilizes both assembly and mapping information and is capable of highly accurate and sensitive SNP calling in a single step. BactSNP is also able to call SNPs among isolates when the reference genome is a draft one or even when the user does not input the reference genome. BactSNP is available at https://github.com/IEkAdN/BactSNP.
format Online
Article
Text
id pubmed-6562250
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-65622502019-06-21 Evaluation of SNP calling methods for closely related bacterial isolates and a novel high-accuracy pipeline: BactSNP Yoshimura, Dai Kajitani, Rei Gotoh, Yasuhiro Katahira, Katsuyuki Okuno, Miki Ogura, Yoshitoshi Hayashi, Tetsuya Itoh, Takehiko Microb Genom Methods Bacteria are highly diverse, even within a species; thus, there have been many studies which classify a single species into multiple types and analyze the genetic differences between them. Recently, the use of whole-genome sequencing (WGS) has been popular for these analyses, and the identification of single-nucleotide polymorphisms (SNPs) between isolates is the most basic analysis performed following WGS. The performance of SNP-calling methods therefore has a significant effect on the accuracy of downstream analyses, such as phylogenetic tree inference. In particular, when closely related isolates are analyzed, e.g. in outbreak investigations, some SNP callers tend to detect a high number of false-positive SNPs compared with the limited number of true SNPs among isolates. However, the performances of various SNP callers in such a situation have not been validated sufficiently. Here, we show the results of realistic benchmarks of commonly used SNP callers, revealing that some of them exhibit markedly low accuracy when target isolates are closely related. As an alternative, we developed a novel pipeline BactSNP, which utilizes both assembly and mapping information and is capable of highly accurate and sensitive SNP calling in a single step. BactSNP is also able to call SNPs among isolates when the reference genome is a draft one or even when the user does not input the reference genome. BactSNP is available at https://github.com/IEkAdN/BactSNP. Microbiology Society 2019-05-17 /pmc/articles/PMC6562250/ /pubmed/31099741 http://dx.doi.org/10.1099/mgen.0.000261 Text en © 2019 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods
Yoshimura, Dai
Kajitani, Rei
Gotoh, Yasuhiro
Katahira, Katsuyuki
Okuno, Miki
Ogura, Yoshitoshi
Hayashi, Tetsuya
Itoh, Takehiko
Evaluation of SNP calling methods for closely related bacterial isolates and a novel high-accuracy pipeline: BactSNP
title Evaluation of SNP calling methods for closely related bacterial isolates and a novel high-accuracy pipeline: BactSNP
title_full Evaluation of SNP calling methods for closely related bacterial isolates and a novel high-accuracy pipeline: BactSNP
title_fullStr Evaluation of SNP calling methods for closely related bacterial isolates and a novel high-accuracy pipeline: BactSNP
title_full_unstemmed Evaluation of SNP calling methods for closely related bacterial isolates and a novel high-accuracy pipeline: BactSNP
title_short Evaluation of SNP calling methods for closely related bacterial isolates and a novel high-accuracy pipeline: BactSNP
title_sort evaluation of snp calling methods for closely related bacterial isolates and a novel high-accuracy pipeline: bactsnp
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6562250/
https://www.ncbi.nlm.nih.gov/pubmed/31099741
http://dx.doi.org/10.1099/mgen.0.000261
work_keys_str_mv AT yoshimuradai evaluationofsnpcallingmethodsforcloselyrelatedbacterialisolatesandanovelhighaccuracypipelinebactsnp
AT kajitanirei evaluationofsnpcallingmethodsforcloselyrelatedbacterialisolatesandanovelhighaccuracypipelinebactsnp
AT gotohyasuhiro evaluationofsnpcallingmethodsforcloselyrelatedbacterialisolatesandanovelhighaccuracypipelinebactsnp
AT katahirakatsuyuki evaluationofsnpcallingmethodsforcloselyrelatedbacterialisolatesandanovelhighaccuracypipelinebactsnp
AT okunomiki evaluationofsnpcallingmethodsforcloselyrelatedbacterialisolatesandanovelhighaccuracypipelinebactsnp
AT ogurayoshitoshi evaluationofsnpcallingmethodsforcloselyrelatedbacterialisolatesandanovelhighaccuracypipelinebactsnp
AT hayashitetsuya evaluationofsnpcallingmethodsforcloselyrelatedbacterialisolatesandanovelhighaccuracypipelinebactsnp
AT itohtakehiko evaluationofsnpcallingmethodsforcloselyrelatedbacterialisolatesandanovelhighaccuracypipelinebactsnp