Cargando…

Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library

Spectrum matching is the most common method for compound identification in mass spectrometry (MS). However, some challenges limit its efficiency, including the coverage of spectral libraries, the accuracy, and the speed of matching. In this study, a million-scale in-silico EI-MS library is establish...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Qiong, Ji, Hongchao, Xu, Zhenbo, Li, Yiming, Wang, Pingshan, Sun, Jinyu, Fan, Xiaqiong, Zhang, Hailiang, Lu, Hongmei, Zhang, Zhimin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10287733/
https://www.ncbi.nlm.nih.gov/pubmed/37349295
http://dx.doi.org/10.1038/s41467-023-39279-7
_version_ 1785061936838213632
author Yang, Qiong
Ji, Hongchao
Xu, Zhenbo
Li, Yiming
Wang, Pingshan
Sun, Jinyu
Fan, Xiaqiong
Zhang, Hailiang
Lu, Hongmei
Zhang, Zhimin
author_facet Yang, Qiong
Ji, Hongchao
Xu, Zhenbo
Li, Yiming
Wang, Pingshan
Sun, Jinyu
Fan, Xiaqiong
Zhang, Hailiang
Lu, Hongmei
Zhang, Zhimin
author_sort Yang, Qiong
collection PubMed
description Spectrum matching is the most common method for compound identification in mass spectrometry (MS). However, some challenges limit its efficiency, including the coverage of spectral libraries, the accuracy, and the speed of matching. In this study, a million-scale in-silico EI-MS library is established. Furthermore, an ultra-fast and accurate spectrum matching (FastEI) method is proposed to substantially improve accuracy using Word2vec spectral embedding and boost the speed using the hierarchical navigable small-world graph (HNSW). It achieves 80.4% recall@10 accuracy (88.3% with 5 Da mass filter) with a speedup of two orders of magnitude compared with the weighted cosine similarity method (WCS). When FastEI is applied to identify the molecules beyond NIST 2017 library, it achieves 50% recall@1 accuracy. FastEI is packaged as a standalone and user-friendly software for common users with limited computational backgrounds. Overall, FastEI combined with a million-scale in-silico library facilitates compound identification as an accurate and ultra-fast tool.
format Online
Article
Text
id pubmed-10287733
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-102877332023-06-24 Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library Yang, Qiong Ji, Hongchao Xu, Zhenbo Li, Yiming Wang, Pingshan Sun, Jinyu Fan, Xiaqiong Zhang, Hailiang Lu, Hongmei Zhang, Zhimin Nat Commun Article Spectrum matching is the most common method for compound identification in mass spectrometry (MS). However, some challenges limit its efficiency, including the coverage of spectral libraries, the accuracy, and the speed of matching. In this study, a million-scale in-silico EI-MS library is established. Furthermore, an ultra-fast and accurate spectrum matching (FastEI) method is proposed to substantially improve accuracy using Word2vec spectral embedding and boost the speed using the hierarchical navigable small-world graph (HNSW). It achieves 80.4% recall@10 accuracy (88.3% with 5 Da mass filter) with a speedup of two orders of magnitude compared with the weighted cosine similarity method (WCS). When FastEI is applied to identify the molecules beyond NIST 2017 library, it achieves 50% recall@1 accuracy. FastEI is packaged as a standalone and user-friendly software for common users with limited computational backgrounds. Overall, FastEI combined with a million-scale in-silico library facilitates compound identification as an accurate and ultra-fast tool. Nature Publishing Group UK 2023-06-22 /pmc/articles/PMC10287733/ /pubmed/37349295 http://dx.doi.org/10.1038/s41467-023-39279-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Yang, Qiong
Ji, Hongchao
Xu, Zhenbo
Li, Yiming
Wang, Pingshan
Sun, Jinyu
Fan, Xiaqiong
Zhang, Hailiang
Lu, Hongmei
Zhang, Zhimin
Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library
title Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library
title_full Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library
title_fullStr Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library
title_full_unstemmed Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library
title_short Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library
title_sort ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10287733/
https://www.ncbi.nlm.nih.gov/pubmed/37349295
http://dx.doi.org/10.1038/s41467-023-39279-7
work_keys_str_mv AT yangqiong ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary
AT jihongchao ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary
AT xuzhenbo ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary
AT liyiming ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary
AT wangpingshan ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary
AT sunjinyu ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary
AT fanxiaqiong ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary
AT zhanghailiang ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary
AT luhongmei ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary
AT zhangzhimin ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary