Cargando…

In silico region of difference (RD) analysis of Mycobacterium tuberculosis complex from sequence reads using RD-Analyzer

BACKGROUND: Whole-genome sequencing is increasingly used in clinical diagnosis of tuberculosis and study of Mycobacterium tuberculosis complex (MTC). MTC consists of several genetically homogenous mycobacteria species which can cause tuberculosis in humans and animals. Regions of difference (RDs) ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Faksri, Kiatichai, Xia, Eryu, Tan, Jun Hao, Teo, Yik-Ying, Ong, Rick Twee-Hee
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5093977/
https://www.ncbi.nlm.nih.gov/pubmed/27806686
http://dx.doi.org/10.1186/s12864-016-3213-1
_version_ 1782465034110107648
author Faksri, Kiatichai
Xia, Eryu
Tan, Jun Hao
Teo, Yik-Ying
Ong, Rick Twee-Hee
author_facet Faksri, Kiatichai
Xia, Eryu
Tan, Jun Hao
Teo, Yik-Ying
Ong, Rick Twee-Hee
author_sort Faksri, Kiatichai
collection PubMed
description BACKGROUND: Whole-genome sequencing is increasingly used in clinical diagnosis of tuberculosis and study of Mycobacterium tuberculosis complex (MTC). MTC consists of several genetically homogenous mycobacteria species which can cause tuberculosis in humans and animals. Regions of difference (RDs) are commonly regarded as gold standard genetic markers for MTC classification. RESULTS: We develop RD-Analyzer, a tool that can accurately infer the species and lineage of MTC isolates from sequence reads based on the presence and absence of a set of 31 RDs. Applied on a publicly available diverse set of 377 sequenced MTC isolates from known major species and lineages, RD-Analyzer achieved an accuracy of 98.14 % (370/377) in species prediction and a concordance of 98.47 % (257/261) in Mycobacterium tuberculosis lineage prediction compared to predictions based on single nucleotide polymorphism markers. By comparing respective sequencing read depths on each genomic position between isolates of different sublineages, we were able to identify the known RD markers in different sublineages of Lineage 4 and provide support for six potential delineating markers having high sensitivities and specificities for sublineage prediction. An extended version of RD-Analyzer was thus developed to allow user-defined RDs for lineage prediction. CONCLUSIONS: RD-Analyzer is a useful and accurate tool for species, lineage and sublineage prediction using known RDs of MTC from sequence reads and is extendable to accepting user-defined RDs for analysis. RD-Analyzer is written in Python and is freely available at https://github.com/xiaeryu/RD-Analyzer. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3213-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5093977
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50939772016-11-07 In silico region of difference (RD) analysis of Mycobacterium tuberculosis complex from sequence reads using RD-Analyzer Faksri, Kiatichai Xia, Eryu Tan, Jun Hao Teo, Yik-Ying Ong, Rick Twee-Hee BMC Genomics Software BACKGROUND: Whole-genome sequencing is increasingly used in clinical diagnosis of tuberculosis and study of Mycobacterium tuberculosis complex (MTC). MTC consists of several genetically homogenous mycobacteria species which can cause tuberculosis in humans and animals. Regions of difference (RDs) are commonly regarded as gold standard genetic markers for MTC classification. RESULTS: We develop RD-Analyzer, a tool that can accurately infer the species and lineage of MTC isolates from sequence reads based on the presence and absence of a set of 31 RDs. Applied on a publicly available diverse set of 377 sequenced MTC isolates from known major species and lineages, RD-Analyzer achieved an accuracy of 98.14 % (370/377) in species prediction and a concordance of 98.47 % (257/261) in Mycobacterium tuberculosis lineage prediction compared to predictions based on single nucleotide polymorphism markers. By comparing respective sequencing read depths on each genomic position between isolates of different sublineages, we were able to identify the known RD markers in different sublineages of Lineage 4 and provide support for six potential delineating markers having high sensitivities and specificities for sublineage prediction. An extended version of RD-Analyzer was thus developed to allow user-defined RDs for lineage prediction. CONCLUSIONS: RD-Analyzer is a useful and accurate tool for species, lineage and sublineage prediction using known RDs of MTC from sequence reads and is extendable to accepting user-defined RDs for analysis. RD-Analyzer is written in Python and is freely available at https://github.com/xiaeryu/RD-Analyzer. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3213-1) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-02 /pmc/articles/PMC5093977/ /pubmed/27806686 http://dx.doi.org/10.1186/s12864-016-3213-1 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Faksri, Kiatichai
Xia, Eryu
Tan, Jun Hao
Teo, Yik-Ying
Ong, Rick Twee-Hee
In silico region of difference (RD) analysis of Mycobacterium tuberculosis complex from sequence reads using RD-Analyzer
title In silico region of difference (RD) analysis of Mycobacterium tuberculosis complex from sequence reads using RD-Analyzer
title_full In silico region of difference (RD) analysis of Mycobacterium tuberculosis complex from sequence reads using RD-Analyzer
title_fullStr In silico region of difference (RD) analysis of Mycobacterium tuberculosis complex from sequence reads using RD-Analyzer
title_full_unstemmed In silico region of difference (RD) analysis of Mycobacterium tuberculosis complex from sequence reads using RD-Analyzer
title_short In silico region of difference (RD) analysis of Mycobacterium tuberculosis complex from sequence reads using RD-Analyzer
title_sort in silico region of difference (rd) analysis of mycobacterium tuberculosis complex from sequence reads using rd-analyzer
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5093977/
https://www.ncbi.nlm.nih.gov/pubmed/27806686
http://dx.doi.org/10.1186/s12864-016-3213-1
work_keys_str_mv AT faksrikiatichai insilicoregionofdifferencerdanalysisofmycobacteriumtuberculosiscomplexfromsequencereadsusingrdanalyzer
AT xiaeryu insilicoregionofdifferencerdanalysisofmycobacteriumtuberculosiscomplexfromsequencereadsusingrdanalyzer
AT tanjunhao insilicoregionofdifferencerdanalysisofmycobacteriumtuberculosiscomplexfromsequencereadsusingrdanalyzer
AT teoyikying insilicoregionofdifferencerdanalysisofmycobacteriumtuberculosiscomplexfromsequencereadsusingrdanalyzer
AT ongricktweehee insilicoregionofdifferencerdanalysisofmycobacteriumtuberculosiscomplexfromsequencereadsusingrdanalyzer