Cargando…

Bacterial species identification using MALDI-TOF mass spectrometry and machine learning techniques: A large-scale benchmarking study

Today machine learning methods are commonly deployed for bacterial species identification using MALDI-TOF mass spectrometry data. However, most of the studies reported in literature only consider very traditional machine learning methods on small datasets that contain a limited number of species. In...

Descripción completa

Detalles Bibliográficos
Autores principales: Mortier, Thomas, Wieme, Anneleen D., Vandamme, Peter, Waegeman, Willem
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8649224/
https://www.ncbi.nlm.nih.gov/pubmed/34938408
http://dx.doi.org/10.1016/j.csbj.2021.11.004
_version_ 1784610947779788800
author Mortier, Thomas
Wieme, Anneleen D.
Vandamme, Peter
Waegeman, Willem
author_facet Mortier, Thomas
Wieme, Anneleen D.
Vandamme, Peter
Waegeman, Willem
author_sort Mortier, Thomas
collection PubMed
description Today machine learning methods are commonly deployed for bacterial species identification using MALDI-TOF mass spectrometry data. However, most of the studies reported in literature only consider very traditional machine learning methods on small datasets that contain a limited number of species. In this paper we present benchmarking results on an unprecedented scale for a wide range of machine learning methods, using datasets that contain almost 100,000 spectra and more than 1000 different species. The size and the diversity of the data allow to compare three important identification scenarios that are often not distinguished in literature, i.e., identification for novel biological replicates, novel strains and novel species that are not present in the training data. The results demonstrate that in all three scenarios acceptable identification rates are obtained, but the numbers are typically lower than those reported in studies with a more limited analysis. Using hierarchical classification methods, we also demonstrate that taxonomic information is in general not well preserved in MALDI-TOF mass spectrometry data. For the novel species scenario, we apply for the first time neural networks with Monte Carlo dropout, which have shown to be successful in other domains, such as computer vision, for the detection of novel species.
format Online
Article
Text
id pubmed-8649224
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-86492242021-12-21 Bacterial species identification using MALDI-TOF mass spectrometry and machine learning techniques: A large-scale benchmarking study Mortier, Thomas Wieme, Anneleen D. Vandamme, Peter Waegeman, Willem Comput Struct Biotechnol J Research Article Today machine learning methods are commonly deployed for bacterial species identification using MALDI-TOF mass spectrometry data. However, most of the studies reported in literature only consider very traditional machine learning methods on small datasets that contain a limited number of species. In this paper we present benchmarking results on an unprecedented scale for a wide range of machine learning methods, using datasets that contain almost 100,000 spectra and more than 1000 different species. The size and the diversity of the data allow to compare three important identification scenarios that are often not distinguished in literature, i.e., identification for novel biological replicates, novel strains and novel species that are not present in the training data. The results demonstrate that in all three scenarios acceptable identification rates are obtained, but the numbers are typically lower than those reported in studies with a more limited analysis. Using hierarchical classification methods, we also demonstrate that taxonomic information is in general not well preserved in MALDI-TOF mass spectrometry data. For the novel species scenario, we apply for the first time neural networks with Monte Carlo dropout, which have shown to be successful in other domains, such as computer vision, for the detection of novel species. Research Network of Computational and Structural Biotechnology 2021-11-09 /pmc/articles/PMC8649224/ /pubmed/34938408 http://dx.doi.org/10.1016/j.csbj.2021.11.004 Text en © 2021 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Mortier, Thomas
Wieme, Anneleen D.
Vandamme, Peter
Waegeman, Willem
Bacterial species identification using MALDI-TOF mass spectrometry and machine learning techniques: A large-scale benchmarking study
title Bacterial species identification using MALDI-TOF mass spectrometry and machine learning techniques: A large-scale benchmarking study
title_full Bacterial species identification using MALDI-TOF mass spectrometry and machine learning techniques: A large-scale benchmarking study
title_fullStr Bacterial species identification using MALDI-TOF mass spectrometry and machine learning techniques: A large-scale benchmarking study
title_full_unstemmed Bacterial species identification using MALDI-TOF mass spectrometry and machine learning techniques: A large-scale benchmarking study
title_short Bacterial species identification using MALDI-TOF mass spectrometry and machine learning techniques: A large-scale benchmarking study
title_sort bacterial species identification using maldi-tof mass spectrometry and machine learning techniques: a large-scale benchmarking study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8649224/
https://www.ncbi.nlm.nih.gov/pubmed/34938408
http://dx.doi.org/10.1016/j.csbj.2021.11.004
work_keys_str_mv AT mortierthomas bacterialspeciesidentificationusingmalditofmassspectrometryandmachinelearningtechniquesalargescalebenchmarkingstudy
AT wiemeanneleend bacterialspeciesidentificationusingmalditofmassspectrometryandmachinelearningtechniquesalargescalebenchmarkingstudy
AT vandammepeter bacterialspeciesidentificationusingmalditofmassspectrometryandmachinelearningtechniquesalargescalebenchmarkingstudy
AT waegemanwillem bacterialspeciesidentificationusingmalditofmassspectrometryandmachinelearningtechniquesalargescalebenchmarkingstudy