Cargando…

Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families

Bioinformatic tools are currently being developed to better understand the Mycobacterium tuberculosis complex (MTBC). Several approaches already exist for the identification of MTBC lineages using classical genotyping methods such as mycobacterial interspersed repetitive units—variable number of tan...

Descripción completa

Detalles Bibliográficos
Autores principales: Couvin, David, Segretier, Wilfried, Stattner, Erick, Rastogi, Nalin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7737520/
https://www.ncbi.nlm.nih.gov/pubmed/33320180
http://dx.doi.org/10.1093/database/baaa108
_version_ 1783622955425595392
author Couvin, David
Segretier, Wilfried
Stattner, Erick
Rastogi, Nalin
author_facet Couvin, David
Segretier, Wilfried
Stattner, Erick
Rastogi, Nalin
author_sort Couvin, David
collection PubMed
description Bioinformatic tools are currently being developed to better understand the Mycobacterium tuberculosis complex (MTBC). Several approaches already exist for the identification of MTBC lineages using classical genotyping methods such as mycobacterial interspersed repetitive units—variable number of tandem DNA repeats and spoligotyping-based families. In the recently released SITVIT2 proprietary database of the Institut Pasteur de la Guadeloupe, a large number of spoligotype families were assigned by either manual curation/expertise or using an in-house algorithm. In this study, we present two complementary data-driven approaches allowing fast and precise family prediction from spoligotyping patterns. The first one is based on data transformation and the use of decision tree classifiers. In contrast, the second one searches for a set of simple rules using binary masks through a specifically designed evolutionary algorithm. The comparison with the three main approaches in the field highlighted the good performances of our contributions and the significant runtime gain. Finally, we propose the ‘SpolLineages’ software tool (https://github.com/dcouvin/SpolLineages), which implements these approaches for MTBC spoligotype families’ identification.
format Online
Article
Text
id pubmed-7737520
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77375202020-12-17 Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families Couvin, David Segretier, Wilfried Stattner, Erick Rastogi, Nalin Database (Oxford) Database Tool Bioinformatic tools are currently being developed to better understand the Mycobacterium tuberculosis complex (MTBC). Several approaches already exist for the identification of MTBC lineages using classical genotyping methods such as mycobacterial interspersed repetitive units—variable number of tandem DNA repeats and spoligotyping-based families. In the recently released SITVIT2 proprietary database of the Institut Pasteur de la Guadeloupe, a large number of spoligotype families were assigned by either manual curation/expertise or using an in-house algorithm. In this study, we present two complementary data-driven approaches allowing fast and precise family prediction from spoligotyping patterns. The first one is based on data transformation and the use of decision tree classifiers. In contrast, the second one searches for a set of simple rules using binary masks through a specifically designed evolutionary algorithm. The comparison with the three main approaches in the field highlighted the good performances of our contributions and the significant runtime gain. Finally, we propose the ‘SpolLineages’ software tool (https://github.com/dcouvin/SpolLineages), which implements these approaches for MTBC spoligotype families’ identification. Oxford University Press 2020-12-15 /pmc/articles/PMC7737520/ /pubmed/33320180 http://dx.doi.org/10.1093/database/baaa108 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database Tool
Couvin, David
Segretier, Wilfried
Stattner, Erick
Rastogi, Nalin
Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families
title Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families
title_full Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families
title_fullStr Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families
title_full_unstemmed Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families
title_short Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families
title_sort novel methods included in spollineages tool for fast and precise prediction of mycobacterium tuberculosis complex spoligotype families
topic Database Tool
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7737520/
https://www.ncbi.nlm.nih.gov/pubmed/33320180
http://dx.doi.org/10.1093/database/baaa108
work_keys_str_mv AT couvindavid novelmethodsincludedinspollineagestoolforfastandprecisepredictionofmycobacteriumtuberculosiscomplexspoligotypefamilies
AT segretierwilfried novelmethodsincludedinspollineagestoolforfastandprecisepredictionofmycobacteriumtuberculosiscomplexspoligotypefamilies
AT stattnererick novelmethodsincludedinspollineagestoolforfastandprecisepredictionofmycobacteriumtuberculosiscomplexspoligotypefamilies
AT rastoginalin novelmethodsincludedinspollineagestoolforfastandprecisepredictionofmycobacteriumtuberculosiscomplexspoligotypefamilies