Cargando…
Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library
Spectrum matching is the most common method for compound identification in mass spectrometry (MS). However, some challenges limit its efficiency, including the coverage of spectral libraries, the accuracy, and the speed of matching. In this study, a million-scale in-silico EI-MS library is establish...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10287733/ https://www.ncbi.nlm.nih.gov/pubmed/37349295 http://dx.doi.org/10.1038/s41467-023-39279-7 |
_version_ | 1785061936838213632 |
---|---|
author | Yang, Qiong Ji, Hongchao Xu, Zhenbo Li, Yiming Wang, Pingshan Sun, Jinyu Fan, Xiaqiong Zhang, Hailiang Lu, Hongmei Zhang, Zhimin |
author_facet | Yang, Qiong Ji, Hongchao Xu, Zhenbo Li, Yiming Wang, Pingshan Sun, Jinyu Fan, Xiaqiong Zhang, Hailiang Lu, Hongmei Zhang, Zhimin |
author_sort | Yang, Qiong |
collection | PubMed |
description | Spectrum matching is the most common method for compound identification in mass spectrometry (MS). However, some challenges limit its efficiency, including the coverage of spectral libraries, the accuracy, and the speed of matching. In this study, a million-scale in-silico EI-MS library is established. Furthermore, an ultra-fast and accurate spectrum matching (FastEI) method is proposed to substantially improve accuracy using Word2vec spectral embedding and boost the speed using the hierarchical navigable small-world graph (HNSW). It achieves 80.4% recall@10 accuracy (88.3% with 5 Da mass filter) with a speedup of two orders of magnitude compared with the weighted cosine similarity method (WCS). When FastEI is applied to identify the molecules beyond NIST 2017 library, it achieves 50% recall@1 accuracy. FastEI is packaged as a standalone and user-friendly software for common users with limited computational backgrounds. Overall, FastEI combined with a million-scale in-silico library facilitates compound identification as an accurate and ultra-fast tool. |
format | Online Article Text |
id | pubmed-10287733 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-102877332023-06-24 Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library Yang, Qiong Ji, Hongchao Xu, Zhenbo Li, Yiming Wang, Pingshan Sun, Jinyu Fan, Xiaqiong Zhang, Hailiang Lu, Hongmei Zhang, Zhimin Nat Commun Article Spectrum matching is the most common method for compound identification in mass spectrometry (MS). However, some challenges limit its efficiency, including the coverage of spectral libraries, the accuracy, and the speed of matching. In this study, a million-scale in-silico EI-MS library is established. Furthermore, an ultra-fast and accurate spectrum matching (FastEI) method is proposed to substantially improve accuracy using Word2vec spectral embedding and boost the speed using the hierarchical navigable small-world graph (HNSW). It achieves 80.4% recall@10 accuracy (88.3% with 5 Da mass filter) with a speedup of two orders of magnitude compared with the weighted cosine similarity method (WCS). When FastEI is applied to identify the molecules beyond NIST 2017 library, it achieves 50% recall@1 accuracy. FastEI is packaged as a standalone and user-friendly software for common users with limited computational backgrounds. Overall, FastEI combined with a million-scale in-silico library facilitates compound identification as an accurate and ultra-fast tool. Nature Publishing Group UK 2023-06-22 /pmc/articles/PMC10287733/ /pubmed/37349295 http://dx.doi.org/10.1038/s41467-023-39279-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Yang, Qiong Ji, Hongchao Xu, Zhenbo Li, Yiming Wang, Pingshan Sun, Jinyu Fan, Xiaqiong Zhang, Hailiang Lu, Hongmei Zhang, Zhimin Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library |
title | Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library |
title_full | Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library |
title_fullStr | Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library |
title_full_unstemmed | Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library |
title_short | Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library |
title_sort | ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10287733/ https://www.ncbi.nlm.nih.gov/pubmed/37349295 http://dx.doi.org/10.1038/s41467-023-39279-7 |
work_keys_str_mv | AT yangqiong ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary AT jihongchao ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary AT xuzhenbo ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary AT liyiming ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary AT wangpingshan ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary AT sunjinyu ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary AT fanxiaqiong ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary AT zhanghailiang ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary AT luhongmei ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary AT zhangzhimin ultrafastandaccurateelectronionizationmassspectrummatchingforcompoundidentificationwithmillionscaleinsilicolibrary |