Cargando…

Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data

In clinical microbiology, matrix-assisted laser desorption ionization–time-of-flight mass spectrometry (MALDI-TOF MS) is frequently employed for rapid microbial identification. However, rapid identification of antimicrobial resistance (AMR) in Escherichia coli based on a large amount of MALDI-TOF MS...

Descripción completa

Detalles Bibliográficos
Autores principales: Chung, Chia-Ru, Wang, Hsin-Yao, Yao, Chun-Han, Wu, Li-Ching, Lu, Jang-Jih, Horng, Jorng-Tzong, Lee, Tzong-Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10269626/
https://www.ncbi.nlm.nih.gov/pubmed/37042778
http://dx.doi.org/10.1128/spectrum.03479-22
_version_ 1785059211574509568
author Chung, Chia-Ru
Wang, Hsin-Yao
Yao, Chun-Han
Wu, Li-Ching
Lu, Jang-Jih
Horng, Jorng-Tzong
Lee, Tzong-Yi
author_facet Chung, Chia-Ru
Wang, Hsin-Yao
Yao, Chun-Han
Wu, Li-Ching
Lu, Jang-Jih
Horng, Jorng-Tzong
Lee, Tzong-Yi
author_sort Chung, Chia-Ru
collection PubMed
description In clinical microbiology, matrix-assisted laser desorption ionization–time-of-flight mass spectrometry (MALDI-TOF MS) is frequently employed for rapid microbial identification. However, rapid identification of antimicrobial resistance (AMR) in Escherichia coli based on a large amount of MALDI-TOF MS data has not yet been reported. This may be because building a prediction model to cover all E. coli isolates would be challenging given the high diversity of the E. coli population. This study aimed to develop a MALDI-TOF MS-based, data-driven, two-stage framework for characterizing different AMRs in E. coli. Specifically, amoxicillin (AMC), ceftazidime (CAZ), ciprofloxacin (CIP), ceftriaxone (CRO), and cefuroxime (CXM) were used. In the first stage, we split the data into two groups based on informative peaks according to the importance of the random forest. In the second stage, prediction models were constructed using four different machine learning algorithms−logistic regression, support vector machine, random forest, and extreme gradient boosting (XGBoost). The findings demonstrate that XGBoost outperformed the other four machine learning models. The values of the area under the receiver operating characteristic curve were 0.62, 0.72, 0.87, 0.72, and 0.72 for AMC, CAZ, CIP, CRO, and CXM, respectively. This implies that a data-driven, two-stage framework could improve accuracy by approximately 2.8%. As a result, we developed AMR prediction models for E. coli using a data-driven two-stage framework, which is promising for assisting physicians in making decisions. Further, the analysis of informative peaks in future studies could potentially reveal new insights. IMPORTANCE Based on a large amount of matrix-assisted laser desorption ionization–time-of-flight mass spectrometry (MALDI-TOF MS) clinical data, comprising 37,918 Escherichia coli isolates, a data-driven two-stage framework was established to evaluate the antimicrobial resistance of E. coli. Five antibiotics, including amoxicillin (AMC), ceftazidime (CAZ), ciprofloxacin (CIP), ceftriaxone (CRO), and cefuroxime (CXM), were considered for the two-stage model training, and the values of the area under the receiver operating characteristic curve (AUC) were 0.62 for AMC, 0.72 for CAZ, 0.87 for CIP, 0.72 for CRO, and 0.72 for CXM. Further investigations revealed that the informative peak m/z 9714 appeared with some important peaks at m/z 6809, m/z 7650, m/z 10534, and m/z 11783 for CIP and at m/z 6809, m/z 10475, and m/z 8447 for CAZ, CRO, and CXM. This framework has the potential to improve the accuracy by approximately 2.8%, indicating a promising potential for further research.
format Online
Article
Text
id pubmed-10269626
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-102696262023-06-16 Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data Chung, Chia-Ru Wang, Hsin-Yao Yao, Chun-Han Wu, Li-Ching Lu, Jang-Jih Horng, Jorng-Tzong Lee, Tzong-Yi Microbiol Spectr Research Article In clinical microbiology, matrix-assisted laser desorption ionization–time-of-flight mass spectrometry (MALDI-TOF MS) is frequently employed for rapid microbial identification. However, rapid identification of antimicrobial resistance (AMR) in Escherichia coli based on a large amount of MALDI-TOF MS data has not yet been reported. This may be because building a prediction model to cover all E. coli isolates would be challenging given the high diversity of the E. coli population. This study aimed to develop a MALDI-TOF MS-based, data-driven, two-stage framework for characterizing different AMRs in E. coli. Specifically, amoxicillin (AMC), ceftazidime (CAZ), ciprofloxacin (CIP), ceftriaxone (CRO), and cefuroxime (CXM) were used. In the first stage, we split the data into two groups based on informative peaks according to the importance of the random forest. In the second stage, prediction models were constructed using four different machine learning algorithms−logistic regression, support vector machine, random forest, and extreme gradient boosting (XGBoost). The findings demonstrate that XGBoost outperformed the other four machine learning models. The values of the area under the receiver operating characteristic curve were 0.62, 0.72, 0.87, 0.72, and 0.72 for AMC, CAZ, CIP, CRO, and CXM, respectively. This implies that a data-driven, two-stage framework could improve accuracy by approximately 2.8%. As a result, we developed AMR prediction models for E. coli using a data-driven two-stage framework, which is promising for assisting physicians in making decisions. Further, the analysis of informative peaks in future studies could potentially reveal new insights. IMPORTANCE Based on a large amount of matrix-assisted laser desorption ionization–time-of-flight mass spectrometry (MALDI-TOF MS) clinical data, comprising 37,918 Escherichia coli isolates, a data-driven two-stage framework was established to evaluate the antimicrobial resistance of E. coli. Five antibiotics, including amoxicillin (AMC), ceftazidime (CAZ), ciprofloxacin (CIP), ceftriaxone (CRO), and cefuroxime (CXM), were considered for the two-stage model training, and the values of the area under the receiver operating characteristic curve (AUC) were 0.62 for AMC, 0.72 for CAZ, 0.87 for CIP, 0.72 for CRO, and 0.72 for CXM. Further investigations revealed that the informative peak m/z 9714 appeared with some important peaks at m/z 6809, m/z 7650, m/z 10534, and m/z 11783 for CIP and at m/z 6809, m/z 10475, and m/z 8447 for CAZ, CRO, and CXM. This framework has the potential to improve the accuracy by approximately 2.8%, indicating a promising potential for further research. American Society for Microbiology 2023-04-12 /pmc/articles/PMC10269626/ /pubmed/37042778 http://dx.doi.org/10.1128/spectrum.03479-22 Text en Copyright © 2023 Chung et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Chung, Chia-Ru
Wang, Hsin-Yao
Yao, Chun-Han
Wu, Li-Ching
Lu, Jang-Jih
Horng, Jorng-Tzong
Lee, Tzong-Yi
Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data
title Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data
title_full Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data
title_fullStr Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data
title_full_unstemmed Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data
title_short Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data
title_sort data-driven two-stage framework for identification and characterization of different antibiotic-resistant escherichia coli isolates based on mass spectrometry data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10269626/
https://www.ncbi.nlm.nih.gov/pubmed/37042778
http://dx.doi.org/10.1128/spectrum.03479-22
work_keys_str_mv AT chungchiaru datadriventwostageframeworkforidentificationandcharacterizationofdifferentantibioticresistantescherichiacoliisolatesbasedonmassspectrometrydata
AT wanghsinyao datadriventwostageframeworkforidentificationandcharacterizationofdifferentantibioticresistantescherichiacoliisolatesbasedonmassspectrometrydata
AT yaochunhan datadriventwostageframeworkforidentificationandcharacterizationofdifferentantibioticresistantescherichiacoliisolatesbasedonmassspectrometrydata
AT wuliching datadriventwostageframeworkforidentificationandcharacterizationofdifferentantibioticresistantescherichiacoliisolatesbasedonmassspectrometrydata
AT lujangjih datadriventwostageframeworkforidentificationandcharacterizationofdifferentantibioticresistantescherichiacoliisolatesbasedonmassspectrometrydata
AT horngjorngtzong datadriventwostageframeworkforidentificationandcharacterizationofdifferentantibioticresistantescherichiacoliisolatesbasedonmassspectrometrydata
AT leetzongyi datadriventwostageframeworkforidentificationandcharacterizationofdifferentantibioticresistantescherichiacoliisolatesbasedonmassspectrometrydata