Cargando…
Short read sequence typing (SRST): multi-locus sequence types from short reads
BACKGROUND: Multi-locus sequence typing (MLST) has become the gold standard for population analyses of bacterial pathogens. This method focuses on the sequences of a small number of loci (usually seven) to divide the population and is simple, robust and facilitates comparison of results between labo...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3460743/ https://www.ncbi.nlm.nih.gov/pubmed/22827703 http://dx.doi.org/10.1186/1471-2164-13-338 |
_version_ | 1782244975322333184 |
---|---|
author | Inouye, Michael Conway, Thomas C Zobel, Justin Holt, Kathryn E |
author_facet | Inouye, Michael Conway, Thomas C Zobel, Justin Holt, Kathryn E |
author_sort | Inouye, Michael |
collection | PubMed |
description | BACKGROUND: Multi-locus sequence typing (MLST) has become the gold standard for population analyses of bacterial pathogens. This method focuses on the sequences of a small number of loci (usually seven) to divide the population and is simple, robust and facilitates comparison of results between laboratories and over time. Over the last decade, researchers and population health specialists have invested substantial effort in building up public MLST databases for nearly 100 different bacterial species, and these databases contain a wealth of important information linked to MLST sequence types such as time and place of isolation, host or niche, serotype and even clinical or drug resistance profiles. Recent advances in sequencing technology mean it is increasingly feasible to perform bacterial population analysis at the whole genome level. This offers massive gains in resolving power and genetic profiling compared to MLST, and will eventually replace MLST for bacterial typing and population analysis. However given the wealth of data currently available in MLST databases, it is crucial to maintain backwards compatibility with MLST schemes so that new genome analyses can be understood in their proper historical context. RESULTS: We present a software tool, SRST, for quick and accurate retrieval of sequence types from short read sets, using inputs easily downloaded from public databases. SRST uses read mapping and an allele assignment score incorporating sequence coverage and variability, to determine the most likely allele at each MLST locus. Analysis of over 3,500 loci in more than 500 publicly accessible Illumina read sets showed SRST to be highly accurate at allele assignment. SRST output is compatible with common analysis tools such as eBURST, Clonal Frame or PhyloViz, allowing easy comparison between novel genome data and MLST data. Alignment, fastq and pileup files can also be generated for novel alleles. CONCLUSIONS: SRST is a novel software tool for accurate assignment of sequence types using short read data. Several uses for the tool are demonstrated, including quality control for high-throughput sequencing projects, plasmid MLST and analysis of genomic data during outbreak investigation. SRST is open-source, requires Python, BWA and SamTools, and is available from http://srst.sourceforge.net. |
format | Online Article Text |
id | pubmed-3460743 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-34607432012-10-02 Short read sequence typing (SRST): multi-locus sequence types from short reads Inouye, Michael Conway, Thomas C Zobel, Justin Holt, Kathryn E BMC Genomics Software BACKGROUND: Multi-locus sequence typing (MLST) has become the gold standard for population analyses of bacterial pathogens. This method focuses on the sequences of a small number of loci (usually seven) to divide the population and is simple, robust and facilitates comparison of results between laboratories and over time. Over the last decade, researchers and population health specialists have invested substantial effort in building up public MLST databases for nearly 100 different bacterial species, and these databases contain a wealth of important information linked to MLST sequence types such as time and place of isolation, host or niche, serotype and even clinical or drug resistance profiles. Recent advances in sequencing technology mean it is increasingly feasible to perform bacterial population analysis at the whole genome level. This offers massive gains in resolving power and genetic profiling compared to MLST, and will eventually replace MLST for bacterial typing and population analysis. However given the wealth of data currently available in MLST databases, it is crucial to maintain backwards compatibility with MLST schemes so that new genome analyses can be understood in their proper historical context. RESULTS: We present a software tool, SRST, for quick and accurate retrieval of sequence types from short read sets, using inputs easily downloaded from public databases. SRST uses read mapping and an allele assignment score incorporating sequence coverage and variability, to determine the most likely allele at each MLST locus. Analysis of over 3,500 loci in more than 500 publicly accessible Illumina read sets showed SRST to be highly accurate at allele assignment. SRST output is compatible with common analysis tools such as eBURST, Clonal Frame or PhyloViz, allowing easy comparison between novel genome data and MLST data. Alignment, fastq and pileup files can also be generated for novel alleles. CONCLUSIONS: SRST is a novel software tool for accurate assignment of sequence types using short read data. Several uses for the tool are demonstrated, including quality control for high-throughput sequencing projects, plasmid MLST and analysis of genomic data during outbreak investigation. SRST is open-source, requires Python, BWA and SamTools, and is available from http://srst.sourceforge.net. BioMed Central 2012-07-24 /pmc/articles/PMC3460743/ /pubmed/22827703 http://dx.doi.org/10.1186/1471-2164-13-338 Text en Copyright ©2012 Inouye et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Inouye, Michael Conway, Thomas C Zobel, Justin Holt, Kathryn E Short read sequence typing (SRST): multi-locus sequence types from short reads |
title | Short read sequence typing (SRST): multi-locus sequence types from short reads |
title_full | Short read sequence typing (SRST): multi-locus sequence types from short reads |
title_fullStr | Short read sequence typing (SRST): multi-locus sequence types from short reads |
title_full_unstemmed | Short read sequence typing (SRST): multi-locus sequence types from short reads |
title_short | Short read sequence typing (SRST): multi-locus sequence types from short reads |
title_sort | short read sequence typing (srst): multi-locus sequence types from short reads |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3460743/ https://www.ncbi.nlm.nih.gov/pubmed/22827703 http://dx.doi.org/10.1186/1471-2164-13-338 |
work_keys_str_mv | AT inouyemichael shortreadsequencetypingsrstmultilocussequencetypesfromshortreads AT conwaythomasc shortreadsequencetypingsrstmultilocussequencetypesfromshortreads AT zobeljustin shortreadsequencetypingsrstmultilocussequencetypesfromshortreads AT holtkathryne shortreadsequencetypingsrstmultilocussequencetypesfromshortreads |