Cargando…

Data mining tools for Salmonella characterization: application to gel-based fingerprinting analysis

BACKGROUND: Pulsed field gel electrophoresis (PFGE) is currently the most widely and routinely used method by the Centers for Disease Control and Prevention (CDC) and state health labs in the United States for Salmonella surveillance and outbreak tracking. Major drawbacks of commercially available P...

Descripción completa

Detalles Bibliográficos
Autores principales: Zou, Wen, Tang, Hailin, Zhao, Weizhong, Meehan, Joe, Foley, Steven L, Lin, Wei-Jiun, Chen, Hung-Chia, Fang, Hong, Nayak, Rajesh, Chen, James J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3851133/
https://www.ncbi.nlm.nih.gov/pubmed/24267777
http://dx.doi.org/10.1186/1471-2105-14-S14-S15
_version_ 1782294231826563072
author Zou, Wen
Tang, Hailin
Zhao, Weizhong
Meehan, Joe
Foley, Steven L
Lin, Wei-Jiun
Chen, Hung-Chia
Fang, Hong
Nayak, Rajesh
Chen, James J
author_facet Zou, Wen
Tang, Hailin
Zhao, Weizhong
Meehan, Joe
Foley, Steven L
Lin, Wei-Jiun
Chen, Hung-Chia
Fang, Hong
Nayak, Rajesh
Chen, James J
author_sort Zou, Wen
collection PubMed
description BACKGROUND: Pulsed field gel electrophoresis (PFGE) is currently the most widely and routinely used method by the Centers for Disease Control and Prevention (CDC) and state health labs in the United States for Salmonella surveillance and outbreak tracking. Major drawbacks of commercially available PFGE analysis programs have been their difficulty in dealing with large datasets and the limited availability of analysis tools. There exists a need to develop new analytical tools for PFGE data mining in order to make full use of valuable data in large surveillance databases. RESULTS: In this study, a software package was developed consisting of five types of bioinformatics approaches exploring and implementing for the analysis and visualization of PFGE fingerprinting. The approaches include PFGE band standardization, Salmonella serotype prediction, hierarchical cluster analysis, distance matrix analysis and two-way hierarchical cluster analysis. PFGE band standardization makes it possible for cross-group large dataset analysis. The Salmonella serotype prediction approach allows users to predict serotypes of Salmonella isolates based on their PFGE patterns. The hierarchical cluster analysis approach could be used to clarify subtypes and phylogenetic relationships among groups of PFGE patterns. The distance matrix and two-way hierarchical cluster analysis tools allow users to directly visualize the similarities/dissimilarities of any two individual patterns and the inter- and intra-serotype relationships of two or more serotypes, and provide a summary of the overall relationships between user-selected serotypes as well as the distinguishable band markers of these serotypes. The functionalities of these tools were illustrated on PFGE fingerprinting data from PulseNet of CDC. CONCLUSIONS: The bioinformatics approaches included in the software package developed in this study were integrated with the PFGE database to enhance the data mining of PFGE fingerprints. Fast and accurate prediction makes it possible to elucidate Salmonella serotype information before conventional serological methods are pursued. The development of bioinformatics tools to distinguish the PFGE markers and serotype specific patterns will enhance PFGE data retrieval, interpretation and serotype identification and will likely accelerate source tracking to identify the Salmonella isolates implicated in foodborne diseases.
format Online
Article
Text
id pubmed-3851133
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38511332013-12-13 Data mining tools for Salmonella characterization: application to gel-based fingerprinting analysis Zou, Wen Tang, Hailin Zhao, Weizhong Meehan, Joe Foley, Steven L Lin, Wei-Jiun Chen, Hung-Chia Fang, Hong Nayak, Rajesh Chen, James J BMC Bioinformatics Proceedings BACKGROUND: Pulsed field gel electrophoresis (PFGE) is currently the most widely and routinely used method by the Centers for Disease Control and Prevention (CDC) and state health labs in the United States for Salmonella surveillance and outbreak tracking. Major drawbacks of commercially available PFGE analysis programs have been their difficulty in dealing with large datasets and the limited availability of analysis tools. There exists a need to develop new analytical tools for PFGE data mining in order to make full use of valuable data in large surveillance databases. RESULTS: In this study, a software package was developed consisting of five types of bioinformatics approaches exploring and implementing for the analysis and visualization of PFGE fingerprinting. The approaches include PFGE band standardization, Salmonella serotype prediction, hierarchical cluster analysis, distance matrix analysis and two-way hierarchical cluster analysis. PFGE band standardization makes it possible for cross-group large dataset analysis. The Salmonella serotype prediction approach allows users to predict serotypes of Salmonella isolates based on their PFGE patterns. The hierarchical cluster analysis approach could be used to clarify subtypes and phylogenetic relationships among groups of PFGE patterns. The distance matrix and two-way hierarchical cluster analysis tools allow users to directly visualize the similarities/dissimilarities of any two individual patterns and the inter- and intra-serotype relationships of two or more serotypes, and provide a summary of the overall relationships between user-selected serotypes as well as the distinguishable band markers of these serotypes. The functionalities of these tools were illustrated on PFGE fingerprinting data from PulseNet of CDC. CONCLUSIONS: The bioinformatics approaches included in the software package developed in this study were integrated with the PFGE database to enhance the data mining of PFGE fingerprints. Fast and accurate prediction makes it possible to elucidate Salmonella serotype information before conventional serological methods are pursued. The development of bioinformatics tools to distinguish the PFGE markers and serotype specific patterns will enhance PFGE data retrieval, interpretation and serotype identification and will likely accelerate source tracking to identify the Salmonella isolates implicated in foodborne diseases. BioMed Central 2013-10-09 /pmc/articles/PMC3851133/ /pubmed/24267777 http://dx.doi.org/10.1186/1471-2105-14-S14-S15 Text en Copyright © 2013 Zou et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Zou, Wen
Tang, Hailin
Zhao, Weizhong
Meehan, Joe
Foley, Steven L
Lin, Wei-Jiun
Chen, Hung-Chia
Fang, Hong
Nayak, Rajesh
Chen, James J
Data mining tools for Salmonella characterization: application to gel-based fingerprinting analysis
title Data mining tools for Salmonella characterization: application to gel-based fingerprinting analysis
title_full Data mining tools for Salmonella characterization: application to gel-based fingerprinting analysis
title_fullStr Data mining tools for Salmonella characterization: application to gel-based fingerprinting analysis
title_full_unstemmed Data mining tools for Salmonella characterization: application to gel-based fingerprinting analysis
title_short Data mining tools for Salmonella characterization: application to gel-based fingerprinting analysis
title_sort data mining tools for salmonella characterization: application to gel-based fingerprinting analysis
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3851133/
https://www.ncbi.nlm.nih.gov/pubmed/24267777
http://dx.doi.org/10.1186/1471-2105-14-S14-S15
work_keys_str_mv AT zouwen dataminingtoolsforsalmonellacharacterizationapplicationtogelbasedfingerprintinganalysis
AT tanghailin dataminingtoolsforsalmonellacharacterizationapplicationtogelbasedfingerprintinganalysis
AT zhaoweizhong dataminingtoolsforsalmonellacharacterizationapplicationtogelbasedfingerprintinganalysis
AT meehanjoe dataminingtoolsforsalmonellacharacterizationapplicationtogelbasedfingerprintinganalysis
AT foleystevenl dataminingtoolsforsalmonellacharacterizationapplicationtogelbasedfingerprintinganalysis
AT linweijiun dataminingtoolsforsalmonellacharacterizationapplicationtogelbasedfingerprintinganalysis
AT chenhungchia dataminingtoolsforsalmonellacharacterizationapplicationtogelbasedfingerprintinganalysis
AT fanghong dataminingtoolsforsalmonellacharacterizationapplicationtogelbasedfingerprintinganalysis
AT nayakrajesh dataminingtoolsforsalmonellacharacterizationapplicationtogelbasedfingerprintinganalysis
AT chenjamesj dataminingtoolsforsalmonellacharacterizationapplicationtogelbasedfingerprintinganalysis