Cargando…

FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies

Multilocus Sequence Typing (MLST) is a precise microbial typing approach at the intra-species level for epidemiologic and evolutionary purposes. It operates by assigning a sequence type (ST) identifier to each specimen, based on a combination of alleles of multiple housekeeping genes included in a d...

Descripción completa

Detalles Bibliográficos
Autores principales: Guerrero-Araya, Enzo, Muñoz, Marina, Rodríguez, César, Paredes-Sabja, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637782/
https://www.ncbi.nlm.nih.gov/pubmed/34866905
http://dx.doi.org/10.1177/11779322211059238
_version_ 1784608815553970176
author Guerrero-Araya, Enzo
Muñoz, Marina
Rodríguez, César
Paredes-Sabja, Daniel
author_facet Guerrero-Araya, Enzo
Muñoz, Marina
Rodríguez, César
Paredes-Sabja, Daniel
author_sort Guerrero-Araya, Enzo
collection PubMed
description Multilocus Sequence Typing (MLST) is a precise microbial typing approach at the intra-species level for epidemiologic and evolutionary purposes. It operates by assigning a sequence type (ST) identifier to each specimen, based on a combination of alleles of multiple housekeeping genes included in a defined scheme. The use of MLST has multiplied due to the availability of large numbers of genomic sequences and epidemiologic data in public repositories. However, data processing speed has become problematic due to the massive size of modern datasets. Here, we present FastMLST, a tool that is designed to perform PubMLST searches using BLASTn and a divide-and-conquer approach that processes each genome assembly in parallel. The output offered by FastMLST includes a table with the ST, allelic profile, and clonal complex or clade (when available), detected for a query, as well as a multi-FASTA file or a series of FASTA files with the concatenated or single allele sequences detected, respectively. FastMLST was validated with 91 different species, with a wide range of guanine-cytosine content (%GC), genome sizes, and fragmentation levels, and a speed test was performed on 3 datasets with varying genome sizes. Compared with other tools such as mlst, CGE/MLST, MLSTar, and PubMLST, FastMLST takes advantage of multiple processors to simultaneously type up to 28 000 genomes in less than 10 minutes, reducing processing times by at least 3-fold with 100% concordance to PubMLST, if contaminated genomes are excluded from the analysis. The source code, installation instructions, and documentation of FastMLST are available at https://github.com/EnzoAndree/FastMLST
format Online
Article
Text
id pubmed-8637782
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-86377822021-12-03 FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies Guerrero-Araya, Enzo Muñoz, Marina Rodríguez, César Paredes-Sabja, Daniel Bioinform Biol Insights Short Report Multilocus Sequence Typing (MLST) is a precise microbial typing approach at the intra-species level for epidemiologic and evolutionary purposes. It operates by assigning a sequence type (ST) identifier to each specimen, based on a combination of alleles of multiple housekeeping genes included in a defined scheme. The use of MLST has multiplied due to the availability of large numbers of genomic sequences and epidemiologic data in public repositories. However, data processing speed has become problematic due to the massive size of modern datasets. Here, we present FastMLST, a tool that is designed to perform PubMLST searches using BLASTn and a divide-and-conquer approach that processes each genome assembly in parallel. The output offered by FastMLST includes a table with the ST, allelic profile, and clonal complex or clade (when available), detected for a query, as well as a multi-FASTA file or a series of FASTA files with the concatenated or single allele sequences detected, respectively. FastMLST was validated with 91 different species, with a wide range of guanine-cytosine content (%GC), genome sizes, and fragmentation levels, and a speed test was performed on 3 datasets with varying genome sizes. Compared with other tools such as mlst, CGE/MLST, MLSTar, and PubMLST, FastMLST takes advantage of multiple processors to simultaneously type up to 28 000 genomes in less than 10 minutes, reducing processing times by at least 3-fold with 100% concordance to PubMLST, if contaminated genomes are excluded from the analysis. The source code, installation instructions, and documentation of FastMLST are available at https://github.com/EnzoAndree/FastMLST SAGE Publications 2021-11-27 /pmc/articles/PMC8637782/ /pubmed/34866905 http://dx.doi.org/10.1177/11779322211059238 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Short Report
Guerrero-Araya, Enzo
Muñoz, Marina
Rodríguez, César
Paredes-Sabja, Daniel
FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies
title FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies
title_full FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies
title_fullStr FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies
title_full_unstemmed FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies
title_short FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies
title_sort fastmlst: a multi-core tool for multilocus sequence typing of draft genome assemblies
topic Short Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637782/
https://www.ncbi.nlm.nih.gov/pubmed/34866905
http://dx.doi.org/10.1177/11779322211059238
work_keys_str_mv AT guerreroarayaenzo fastmlstamulticoretoolformultilocussequencetypingofdraftgenomeassemblies
AT munozmarina fastmlstamulticoretoolformultilocussequencetypingofdraftgenomeassemblies
AT rodriguezcesar fastmlstamulticoretoolformultilocussequencetypingofdraftgenomeassemblies
AT paredessabjadaniel fastmlstamulticoretoolformultilocussequencetypingofdraftgenomeassemblies