Cargando…

Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm

Infra-species taxonomy is a prerequisite to compare features such as virulence in different pathogen lineages. Mycobacterium tuberculosis complex taxonomy has rapidly evolved in the last 20 years through intensive clinical isolation, advances in sequencing and in the description of fast-evolving loc...

Descripción completa

Detalles Bibliográficos
Autores principales: Azé, Jérôme, Sola, Christophe, Zhang, Jian, Lafosse-Marin, Florian, Yasmin, Memona, Siddiqui, Rubina, Kremer, Kristin, van Soolingen, Dick, Refrégier, Guislaine
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4496040/
https://www.ncbi.nlm.nih.gov/pubmed/26154264
http://dx.doi.org/10.1371/journal.pone.0130912
_version_ 1782380333967081472
author Azé, Jérôme
Sola, Christophe
Zhang, Jian
Lafosse-Marin, Florian
Yasmin, Memona
Siddiqui, Rubina
Kremer, Kristin
van Soolingen, Dick
Refrégier, Guislaine
author_facet Azé, Jérôme
Sola, Christophe
Zhang, Jian
Lafosse-Marin, Florian
Yasmin, Memona
Siddiqui, Rubina
Kremer, Kristin
van Soolingen, Dick
Refrégier, Guislaine
author_sort Azé, Jérôme
collection PubMed
description Infra-species taxonomy is a prerequisite to compare features such as virulence in different pathogen lineages. Mycobacterium tuberculosis complex taxonomy has rapidly evolved in the last 20 years through intensive clinical isolation, advances in sequencing and in the description of fast-evolving loci (CRISPR and MIRU-VNTR). On-line tools to describe new isolates have been set up based on known diversity either on CRISPRs (also known as spoligotypes) or on MIRU-VNTR profiles. The underlying taxonomies are largely concordant but use different names and offer different depths. The objectives of this study were 1) to explicit the consensus that exists between the alternative taxonomies, and 2) to provide an on-line tool to ease classification of new isolates. Genotyping (24-VNTR, 43-spacers spoligotypes, IS6110-RFLP) was undertaken for 3,454 clinical isolates from the Netherlands (2004-2008). The resulting database was enlarged with African isolates to include most human tuberculosis diversity. Assignations were obtained using TB-Lineage, MIRU-VNTRPlus, SITVITWEB and an algorithm from Borile et al. By identifying the recurrent concordances between the alternative taxonomies, we proposed a consensus including 22 sublineages. Original and consensus assignations of the all isolates from the database were subsequently implemented into an ensemble learning approach based on Machine Learning tool Weka to derive a classification scheme. All assignations were reproduced with very good sensibilities and specificities. When applied to independent datasets, it was able to suggest new sublineages such as pseudo-Beijing. This Lineage Prediction tool, efficient on 15-MIRU, 24-VNTR and spoligotype data is available on the web interface “TBminer.” Another section of this website helps summarizing key molecular epidemiological data, easing tuberculosis surveillance. Altogether, we successfully used Machine Learning on a large dataset to set up and make available the first consensual taxonomy for human Mycobacterium tuberculosis complex. Additional developments using SNPs will help stabilizing it.
format Online
Article
Text
id pubmed-4496040
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44960402015-07-15 Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm Azé, Jérôme Sola, Christophe Zhang, Jian Lafosse-Marin, Florian Yasmin, Memona Siddiqui, Rubina Kremer, Kristin van Soolingen, Dick Refrégier, Guislaine PLoS One Research Article Infra-species taxonomy is a prerequisite to compare features such as virulence in different pathogen lineages. Mycobacterium tuberculosis complex taxonomy has rapidly evolved in the last 20 years through intensive clinical isolation, advances in sequencing and in the description of fast-evolving loci (CRISPR and MIRU-VNTR). On-line tools to describe new isolates have been set up based on known diversity either on CRISPRs (also known as spoligotypes) or on MIRU-VNTR profiles. The underlying taxonomies are largely concordant but use different names and offer different depths. The objectives of this study were 1) to explicit the consensus that exists between the alternative taxonomies, and 2) to provide an on-line tool to ease classification of new isolates. Genotyping (24-VNTR, 43-spacers spoligotypes, IS6110-RFLP) was undertaken for 3,454 clinical isolates from the Netherlands (2004-2008). The resulting database was enlarged with African isolates to include most human tuberculosis diversity. Assignations were obtained using TB-Lineage, MIRU-VNTRPlus, SITVITWEB and an algorithm from Borile et al. By identifying the recurrent concordances between the alternative taxonomies, we proposed a consensus including 22 sublineages. Original and consensus assignations of the all isolates from the database were subsequently implemented into an ensemble learning approach based on Machine Learning tool Weka to derive a classification scheme. All assignations were reproduced with very good sensibilities and specificities. When applied to independent datasets, it was able to suggest new sublineages such as pseudo-Beijing. This Lineage Prediction tool, efficient on 15-MIRU, 24-VNTR and spoligotype data is available on the web interface “TBminer.” Another section of this website helps summarizing key molecular epidemiological data, easing tuberculosis surveillance. Altogether, we successfully used Machine Learning on a large dataset to set up and make available the first consensual taxonomy for human Mycobacterium tuberculosis complex. Additional developments using SNPs will help stabilizing it. Public Library of Science 2015-07-08 /pmc/articles/PMC4496040/ /pubmed/26154264 http://dx.doi.org/10.1371/journal.pone.0130912 Text en © 2015 Azé et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Azé, Jérôme
Sola, Christophe
Zhang, Jian
Lafosse-Marin, Florian
Yasmin, Memona
Siddiqui, Rubina
Kremer, Kristin
van Soolingen, Dick
Refrégier, Guislaine
Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm
title Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm
title_full Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm
title_fullStr Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm
title_full_unstemmed Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm
title_short Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm
title_sort genomics and machine learning for taxonomy consensus: the mycobacterium tuberculosis complex paradigm
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4496040/
https://www.ncbi.nlm.nih.gov/pubmed/26154264
http://dx.doi.org/10.1371/journal.pone.0130912
work_keys_str_mv AT azejerome genomicsandmachinelearningfortaxonomyconsensusthemycobacteriumtuberculosiscomplexparadigm
AT solachristophe genomicsandmachinelearningfortaxonomyconsensusthemycobacteriumtuberculosiscomplexparadigm
AT zhangjian genomicsandmachinelearningfortaxonomyconsensusthemycobacteriumtuberculosiscomplexparadigm
AT lafossemarinflorian genomicsandmachinelearningfortaxonomyconsensusthemycobacteriumtuberculosiscomplexparadigm
AT yasminmemona genomicsandmachinelearningfortaxonomyconsensusthemycobacteriumtuberculosiscomplexparadigm
AT siddiquirubina genomicsandmachinelearningfortaxonomyconsensusthemycobacteriumtuberculosiscomplexparadigm
AT kremerkristin genomicsandmachinelearningfortaxonomyconsensusthemycobacteriumtuberculosiscomplexparadigm
AT vansoolingendick genomicsandmachinelearningfortaxonomyconsensusthemycobacteriumtuberculosiscomplexparadigm
AT refregierguislaine genomicsandmachinelearningfortaxonomyconsensusthemycobacteriumtuberculosiscomplexparadigm