Cargando…

Construction of a virtual Mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer

BACKGROUND: Although Mycobacterium tuberculosis isolates are consisted of several different lineages and the epidemiology analyses are usually assessed relative to a particular reference genome, M. tuberculosis H37Rv, which might introduce some biased results. Those analyses are essentially based ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Okumura, Kayo, Kato, Masako, Kirikae, Teruo, Kayano, Mitsunori, Miyoshi-Akiyama, Tohru
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4425900/
https://www.ncbi.nlm.nih.gov/pubmed/25879806
http://dx.doi.org/10.1186/s12864-015-1368-9
_version_ 1782370540263047168
author Okumura, Kayo
Kato, Masako
Kirikae, Teruo
Kayano, Mitsunori
Miyoshi-Akiyama, Tohru
author_facet Okumura, Kayo
Kato, Masako
Kirikae, Teruo
Kayano, Mitsunori
Miyoshi-Akiyama, Tohru
author_sort Okumura, Kayo
collection PubMed
description BACKGROUND: Although Mycobacterium tuberculosis isolates are consisted of several different lineages and the epidemiology analyses are usually assessed relative to a particular reference genome, M. tuberculosis H37Rv, which might introduce some biased results. Those analyses are essentially based genome sequence information of M. tuberculosis and could be performed in sillico in theory, with whole genome sequence (WGS) data available in the databases and obtained by next generation sequencers (NGSs). As an approach to establish higher resolution methods for such analyses, whole genome sequences of the M. tuberculosis complexes (MTBCs) strains available on databases were aligned to construct virtual reference genome sequences called the consensus sequence (CS), and evaluated its feasibility in in sillico epidemiological analyses. RESULTS: The consensus sequence (CS) was successfully constructed and utilized to perform phylogenetic analysis, evaluation of read mapping efficacy, which is crucial for detecting single nucleotide polymorphisms (SNPs), and various MTBC typing methods virtually including spoligotyping, VNTR, Long sequence polymorphism and Beijing typing. SNPs detected based on CS, in comparison with H37Rv, were utilized in concatemer-based phylogenetic analysis to determine their reliability relative to a phylogenetic tree based on whole genome alignment as the gold standard. Statistical comparison of phylogenic trees based on CS with that of H37Rv indicated the former showed always better results that that of later. SNP detection and concatenation with CS was advantageous because the frequency of crucial SNPs distinguishing among strain lineages was higher than those of H37Rv. The number of SNPs detected was lower with the consensus than with the H37Rv sequence, resulting in a significant reduction in computational time. Performance of each virtual typing was satisfactory and accorded with those published when those are available. CONCLUSIONS: These results indicated that virtual CS constructed from genome sequence data is an ideal approach as a reference for MTBC studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1368-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4425900
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44259002015-05-10 Construction of a virtual Mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer Okumura, Kayo Kato, Masako Kirikae, Teruo Kayano, Mitsunori Miyoshi-Akiyama, Tohru BMC Genomics Research Article BACKGROUND: Although Mycobacterium tuberculosis isolates are consisted of several different lineages and the epidemiology analyses are usually assessed relative to a particular reference genome, M. tuberculosis H37Rv, which might introduce some biased results. Those analyses are essentially based genome sequence information of M. tuberculosis and could be performed in sillico in theory, with whole genome sequence (WGS) data available in the databases and obtained by next generation sequencers (NGSs). As an approach to establish higher resolution methods for such analyses, whole genome sequences of the M. tuberculosis complexes (MTBCs) strains available on databases were aligned to construct virtual reference genome sequences called the consensus sequence (CS), and evaluated its feasibility in in sillico epidemiological analyses. RESULTS: The consensus sequence (CS) was successfully constructed and utilized to perform phylogenetic analysis, evaluation of read mapping efficacy, which is crucial for detecting single nucleotide polymorphisms (SNPs), and various MTBC typing methods virtually including spoligotyping, VNTR, Long sequence polymorphism and Beijing typing. SNPs detected based on CS, in comparison with H37Rv, were utilized in concatemer-based phylogenetic analysis to determine their reliability relative to a phylogenetic tree based on whole genome alignment as the gold standard. Statistical comparison of phylogenic trees based on CS with that of H37Rv indicated the former showed always better results that that of later. SNP detection and concatenation with CS was advantageous because the frequency of crucial SNPs distinguishing among strain lineages was higher than those of H37Rv. The number of SNPs detected was lower with the consensus than with the H37Rv sequence, resulting in a significant reduction in computational time. Performance of each virtual typing was satisfactory and accorded with those published when those are available. CONCLUSIONS: These results indicated that virtual CS constructed from genome sequence data is an ideal approach as a reference for MTBC studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1368-9) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-20 /pmc/articles/PMC4425900/ /pubmed/25879806 http://dx.doi.org/10.1186/s12864-015-1368-9 Text en © Okumura et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Okumura, Kayo
Kato, Masako
Kirikae, Teruo
Kayano, Mitsunori
Miyoshi-Akiyama, Tohru
Construction of a virtual Mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer
title Construction of a virtual Mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer
title_full Construction of a virtual Mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer
title_fullStr Construction of a virtual Mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer
title_full_unstemmed Construction of a virtual Mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer
title_short Construction of a virtual Mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer
title_sort construction of a virtual mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4425900/
https://www.ncbi.nlm.nih.gov/pubmed/25879806
http://dx.doi.org/10.1186/s12864-015-1368-9
work_keys_str_mv AT okumurakayo constructionofavirtualmycobacteriumtuberculosisconsensusgenomeanditsapplicationtodatafromanextgenerationsequencer
AT katomasako constructionofavirtualmycobacteriumtuberculosisconsensusgenomeanditsapplicationtodatafromanextgenerationsequencer
AT kirikaeteruo constructionofavirtualmycobacteriumtuberculosisconsensusgenomeanditsapplicationtodatafromanextgenerationsequencer
AT kayanomitsunori constructionofavirtualmycobacteriumtuberculosisconsensusgenomeanditsapplicationtodatafromanextgenerationsequencer
AT miyoshiakiyamatohru constructionofavirtualmycobacteriumtuberculosisconsensusgenomeanditsapplicationtodatafromanextgenerationsequencer