Cargando…

An unsupervised machine learning method for assessing quality of tandem mass spectra

BACKGROUND: In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. However, majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. Therefore, the quality assessment (before database search) is very u...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Wenjun, Wang, Jianxin, Zhang, Wen-Jun, Wu, Fang-Xiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3380733/
https://www.ncbi.nlm.nih.gov/pubmed/22759570
http://dx.doi.org/10.1186/1477-5956-10-S1-S12
_version_ 1782236336761077760
author Lin, Wenjun
Wang, Jianxin
Zhang, Wen-Jun
Wu, Fang-Xiang
author_facet Lin, Wenjun
Wang, Jianxin
Zhang, Wen-Jun
Wu, Fang-Xiang
author_sort Lin, Wenjun
collection PubMed
description BACKGROUND: In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. However, majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. Therefore, the quality assessment (before database search) is very useful in the pipeline of protein identification via tandem mass spectra, especially on the reduction of searching time and the decrease of false identifications. Most existing methods for quality assessment are supervised machine learning methods based on a number of features which describe the quality of tandem mass spectra. These methods need the training datasets with knowing the quality of all spectra, which are usually unavailable for the new datasets. RESULTS: This study proposes an unsupervised machine learning method for quality assessment of tandem mass spectra without any training dataset. This proposed method estimates the conditional probabilities of spectra being high quality from the quality assessments based on individual features. The probabilities are estimated through a constraint optimization problem. An efficient algorithm is developed to solve the constraint optimization problem and is proved to be convergent. Experimental results on two datasets illustrate that if we search only tandem spectra with the high quality determined by the proposed method, we can save about 56 % and 62% of database searching time while losing only a small amount of high-quality spectra. CONCLUSIONS: Results indicate that the proposed method has a good performance for the quality assessment of tandem mass spectra and the way we estimate the conditional probabilities is effective.
format Online
Article
Text
id pubmed-3380733
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33807332012-06-25 An unsupervised machine learning method for assessing quality of tandem mass spectra Lin, Wenjun Wang, Jianxin Zhang, Wen-Jun Wu, Fang-Xiang Proteome Sci Proceedings BACKGROUND: In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. However, majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. Therefore, the quality assessment (before database search) is very useful in the pipeline of protein identification via tandem mass spectra, especially on the reduction of searching time and the decrease of false identifications. Most existing methods for quality assessment are supervised machine learning methods based on a number of features which describe the quality of tandem mass spectra. These methods need the training datasets with knowing the quality of all spectra, which are usually unavailable for the new datasets. RESULTS: This study proposes an unsupervised machine learning method for quality assessment of tandem mass spectra without any training dataset. This proposed method estimates the conditional probabilities of spectra being high quality from the quality assessments based on individual features. The probabilities are estimated through a constraint optimization problem. An efficient algorithm is developed to solve the constraint optimization problem and is proved to be convergent. Experimental results on two datasets illustrate that if we search only tandem spectra with the high quality determined by the proposed method, we can save about 56 % and 62% of database searching time while losing only a small amount of high-quality spectra. CONCLUSIONS: Results indicate that the proposed method has a good performance for the quality assessment of tandem mass spectra and the way we estimate the conditional probabilities is effective. BioMed Central 2012-06-21 /pmc/articles/PMC3380733/ /pubmed/22759570 http://dx.doi.org/10.1186/1477-5956-10-S1-S12 Text en Copyright ©2012 Lin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Lin, Wenjun
Wang, Jianxin
Zhang, Wen-Jun
Wu, Fang-Xiang
An unsupervised machine learning method for assessing quality of tandem mass spectra
title An unsupervised machine learning method for assessing quality of tandem mass spectra
title_full An unsupervised machine learning method for assessing quality of tandem mass spectra
title_fullStr An unsupervised machine learning method for assessing quality of tandem mass spectra
title_full_unstemmed An unsupervised machine learning method for assessing quality of tandem mass spectra
title_short An unsupervised machine learning method for assessing quality of tandem mass spectra
title_sort unsupervised machine learning method for assessing quality of tandem mass spectra
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3380733/
https://www.ncbi.nlm.nih.gov/pubmed/22759570
http://dx.doi.org/10.1186/1477-5956-10-S1-S12
work_keys_str_mv AT linwenjun anunsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra
AT wangjianxin anunsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra
AT zhangwenjun anunsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra
AT wufangxiang anunsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra
AT linwenjun unsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra
AT wangjianxin unsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra
AT zhangwenjun unsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra
AT wufangxiang unsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra