Cargando…
Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families
Bioinformatic tools are currently being developed to better understand the Mycobacterium tuberculosis complex (MTBC). Several approaches already exist for the identification of MTBC lineages using classical genotyping methods such as mycobacterial interspersed repetitive units—variable number of tan...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7737520/ https://www.ncbi.nlm.nih.gov/pubmed/33320180 http://dx.doi.org/10.1093/database/baaa108 |
_version_ | 1783622955425595392 |
---|---|
author | Couvin, David Segretier, Wilfried Stattner, Erick Rastogi, Nalin |
author_facet | Couvin, David Segretier, Wilfried Stattner, Erick Rastogi, Nalin |
author_sort | Couvin, David |
collection | PubMed |
description | Bioinformatic tools are currently being developed to better understand the Mycobacterium tuberculosis complex (MTBC). Several approaches already exist for the identification of MTBC lineages using classical genotyping methods such as mycobacterial interspersed repetitive units—variable number of tandem DNA repeats and spoligotyping-based families. In the recently released SITVIT2 proprietary database of the Institut Pasteur de la Guadeloupe, a large number of spoligotype families were assigned by either manual curation/expertise or using an in-house algorithm. In this study, we present two complementary data-driven approaches allowing fast and precise family prediction from spoligotyping patterns. The first one is based on data transformation and the use of decision tree classifiers. In contrast, the second one searches for a set of simple rules using binary masks through a specifically designed evolutionary algorithm. The comparison with the three main approaches in the field highlighted the good performances of our contributions and the significant runtime gain. Finally, we propose the ‘SpolLineages’ software tool (https://github.com/dcouvin/SpolLineages), which implements these approaches for MTBC spoligotype families’ identification. |
format | Online Article Text |
id | pubmed-7737520 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-77375202020-12-17 Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families Couvin, David Segretier, Wilfried Stattner, Erick Rastogi, Nalin Database (Oxford) Database Tool Bioinformatic tools are currently being developed to better understand the Mycobacterium tuberculosis complex (MTBC). Several approaches already exist for the identification of MTBC lineages using classical genotyping methods such as mycobacterial interspersed repetitive units—variable number of tandem DNA repeats and spoligotyping-based families. In the recently released SITVIT2 proprietary database of the Institut Pasteur de la Guadeloupe, a large number of spoligotype families were assigned by either manual curation/expertise or using an in-house algorithm. In this study, we present two complementary data-driven approaches allowing fast and precise family prediction from spoligotyping patterns. The first one is based on data transformation and the use of decision tree classifiers. In contrast, the second one searches for a set of simple rules using binary masks through a specifically designed evolutionary algorithm. The comparison with the three main approaches in the field highlighted the good performances of our contributions and the significant runtime gain. Finally, we propose the ‘SpolLineages’ software tool (https://github.com/dcouvin/SpolLineages), which implements these approaches for MTBC spoligotype families’ identification. Oxford University Press 2020-12-15 /pmc/articles/PMC7737520/ /pubmed/33320180 http://dx.doi.org/10.1093/database/baaa108 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Database Tool Couvin, David Segretier, Wilfried Stattner, Erick Rastogi, Nalin Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families |
title | Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families |
title_full | Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families |
title_fullStr | Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families |
title_full_unstemmed | Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families |
title_short | Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families |
title_sort | novel methods included in spollineages tool for fast and precise prediction of mycobacterium tuberculosis complex spoligotype families |
topic | Database Tool |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7737520/ https://www.ncbi.nlm.nih.gov/pubmed/33320180 http://dx.doi.org/10.1093/database/baaa108 |
work_keys_str_mv | AT couvindavid novelmethodsincludedinspollineagestoolforfastandprecisepredictionofmycobacteriumtuberculosiscomplexspoligotypefamilies AT segretierwilfried novelmethodsincludedinspollineagestoolforfastandprecisepredictionofmycobacteriumtuberculosiscomplexspoligotypefamilies AT stattnererick novelmethodsincludedinspollineagestoolforfastandprecisepredictionofmycobacteriumtuberculosiscomplexspoligotypefamilies AT rastoginalin novelmethodsincludedinspollineagestoolforfastandprecisepredictionofmycobacteriumtuberculosiscomplexspoligotypefamilies |