Cargando…
Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data
In clinical microbiology, matrix-assisted laser desorption ionization–time-of-flight mass spectrometry (MALDI-TOF MS) is frequently employed for rapid microbial identification. However, rapid identification of antimicrobial resistance (AMR) in Escherichia coli based on a large amount of MALDI-TOF MS...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Society for Microbiology
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10269626/ https://www.ncbi.nlm.nih.gov/pubmed/37042778 http://dx.doi.org/10.1128/spectrum.03479-22 |
_version_ | 1785059211574509568 |
---|---|
author | Chung, Chia-Ru Wang, Hsin-Yao Yao, Chun-Han Wu, Li-Ching Lu, Jang-Jih Horng, Jorng-Tzong Lee, Tzong-Yi |
author_facet | Chung, Chia-Ru Wang, Hsin-Yao Yao, Chun-Han Wu, Li-Ching Lu, Jang-Jih Horng, Jorng-Tzong Lee, Tzong-Yi |
author_sort | Chung, Chia-Ru |
collection | PubMed |
description | In clinical microbiology, matrix-assisted laser desorption ionization–time-of-flight mass spectrometry (MALDI-TOF MS) is frequently employed for rapid microbial identification. However, rapid identification of antimicrobial resistance (AMR) in Escherichia coli based on a large amount of MALDI-TOF MS data has not yet been reported. This may be because building a prediction model to cover all E. coli isolates would be challenging given the high diversity of the E. coli population. This study aimed to develop a MALDI-TOF MS-based, data-driven, two-stage framework for characterizing different AMRs in E. coli. Specifically, amoxicillin (AMC), ceftazidime (CAZ), ciprofloxacin (CIP), ceftriaxone (CRO), and cefuroxime (CXM) were used. In the first stage, we split the data into two groups based on informative peaks according to the importance of the random forest. In the second stage, prediction models were constructed using four different machine learning algorithms−logistic regression, support vector machine, random forest, and extreme gradient boosting (XGBoost). The findings demonstrate that XGBoost outperformed the other four machine learning models. The values of the area under the receiver operating characteristic curve were 0.62, 0.72, 0.87, 0.72, and 0.72 for AMC, CAZ, CIP, CRO, and CXM, respectively. This implies that a data-driven, two-stage framework could improve accuracy by approximately 2.8%. As a result, we developed AMR prediction models for E. coli using a data-driven two-stage framework, which is promising for assisting physicians in making decisions. Further, the analysis of informative peaks in future studies could potentially reveal new insights. IMPORTANCE Based on a large amount of matrix-assisted laser desorption ionization–time-of-flight mass spectrometry (MALDI-TOF MS) clinical data, comprising 37,918 Escherichia coli isolates, a data-driven two-stage framework was established to evaluate the antimicrobial resistance of E. coli. Five antibiotics, including amoxicillin (AMC), ceftazidime (CAZ), ciprofloxacin (CIP), ceftriaxone (CRO), and cefuroxime (CXM), were considered for the two-stage model training, and the values of the area under the receiver operating characteristic curve (AUC) were 0.62 for AMC, 0.72 for CAZ, 0.87 for CIP, 0.72 for CRO, and 0.72 for CXM. Further investigations revealed that the informative peak m/z 9714 appeared with some important peaks at m/z 6809, m/z 7650, m/z 10534, and m/z 11783 for CIP and at m/z 6809, m/z 10475, and m/z 8447 for CAZ, CRO, and CXM. This framework has the potential to improve the accuracy by approximately 2.8%, indicating a promising potential for further research. |
format | Online Article Text |
id | pubmed-10269626 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Society for Microbiology |
record_format | MEDLINE/PubMed |
spelling | pubmed-102696262023-06-16 Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data Chung, Chia-Ru Wang, Hsin-Yao Yao, Chun-Han Wu, Li-Ching Lu, Jang-Jih Horng, Jorng-Tzong Lee, Tzong-Yi Microbiol Spectr Research Article In clinical microbiology, matrix-assisted laser desorption ionization–time-of-flight mass spectrometry (MALDI-TOF MS) is frequently employed for rapid microbial identification. However, rapid identification of antimicrobial resistance (AMR) in Escherichia coli based on a large amount of MALDI-TOF MS data has not yet been reported. This may be because building a prediction model to cover all E. coli isolates would be challenging given the high diversity of the E. coli population. This study aimed to develop a MALDI-TOF MS-based, data-driven, two-stage framework for characterizing different AMRs in E. coli. Specifically, amoxicillin (AMC), ceftazidime (CAZ), ciprofloxacin (CIP), ceftriaxone (CRO), and cefuroxime (CXM) were used. In the first stage, we split the data into two groups based on informative peaks according to the importance of the random forest. In the second stage, prediction models were constructed using four different machine learning algorithms−logistic regression, support vector machine, random forest, and extreme gradient boosting (XGBoost). The findings demonstrate that XGBoost outperformed the other four machine learning models. The values of the area under the receiver operating characteristic curve were 0.62, 0.72, 0.87, 0.72, and 0.72 for AMC, CAZ, CIP, CRO, and CXM, respectively. This implies that a data-driven, two-stage framework could improve accuracy by approximately 2.8%. As a result, we developed AMR prediction models for E. coli using a data-driven two-stage framework, which is promising for assisting physicians in making decisions. Further, the analysis of informative peaks in future studies could potentially reveal new insights. IMPORTANCE Based on a large amount of matrix-assisted laser desorption ionization–time-of-flight mass spectrometry (MALDI-TOF MS) clinical data, comprising 37,918 Escherichia coli isolates, a data-driven two-stage framework was established to evaluate the antimicrobial resistance of E. coli. Five antibiotics, including amoxicillin (AMC), ceftazidime (CAZ), ciprofloxacin (CIP), ceftriaxone (CRO), and cefuroxime (CXM), were considered for the two-stage model training, and the values of the area under the receiver operating characteristic curve (AUC) were 0.62 for AMC, 0.72 for CAZ, 0.87 for CIP, 0.72 for CRO, and 0.72 for CXM. Further investigations revealed that the informative peak m/z 9714 appeared with some important peaks at m/z 6809, m/z 7650, m/z 10534, and m/z 11783 for CIP and at m/z 6809, m/z 10475, and m/z 8447 for CAZ, CRO, and CXM. This framework has the potential to improve the accuracy by approximately 2.8%, indicating a promising potential for further research. American Society for Microbiology 2023-04-12 /pmc/articles/PMC10269626/ /pubmed/37042778 http://dx.doi.org/10.1128/spectrum.03479-22 Text en Copyright © 2023 Chung et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Research Article Chung, Chia-Ru Wang, Hsin-Yao Yao, Chun-Han Wu, Li-Ching Lu, Jang-Jih Horng, Jorng-Tzong Lee, Tzong-Yi Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data |
title | Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data |
title_full | Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data |
title_fullStr | Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data |
title_full_unstemmed | Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data |
title_short | Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data |
title_sort | data-driven two-stage framework for identification and characterization of different antibiotic-resistant escherichia coli isolates based on mass spectrometry data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10269626/ https://www.ncbi.nlm.nih.gov/pubmed/37042778 http://dx.doi.org/10.1128/spectrum.03479-22 |
work_keys_str_mv | AT chungchiaru datadriventwostageframeworkforidentificationandcharacterizationofdifferentantibioticresistantescherichiacoliisolatesbasedonmassspectrometrydata AT wanghsinyao datadriventwostageframeworkforidentificationandcharacterizationofdifferentantibioticresistantescherichiacoliisolatesbasedonmassspectrometrydata AT yaochunhan datadriventwostageframeworkforidentificationandcharacterizationofdifferentantibioticresistantescherichiacoliisolatesbasedonmassspectrometrydata AT wuliching datadriventwostageframeworkforidentificationandcharacterizationofdifferentantibioticresistantescherichiacoliisolatesbasedonmassspectrometrydata AT lujangjih datadriventwostageframeworkforidentificationandcharacterizationofdifferentantibioticresistantescherichiacoliisolatesbasedonmassspectrometrydata AT horngjorngtzong datadriventwostageframeworkforidentificationandcharacterizationofdifferentantibioticresistantescherichiacoliisolatesbasedonmassspectrometrydata AT leetzongyi datadriventwostageframeworkforidentificationandcharacterizationofdifferentantibioticresistantescherichiacoliisolatesbasedonmassspectrometrydata |