Cargando…
An unsupervised machine learning method for assessing quality of tandem mass spectra
BACKGROUND: In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. However, majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. Therefore, the quality assessment (before database search) is very u...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3380733/ https://www.ncbi.nlm.nih.gov/pubmed/22759570 http://dx.doi.org/10.1186/1477-5956-10-S1-S12 |
_version_ | 1782236336761077760 |
---|---|
author | Lin, Wenjun Wang, Jianxin Zhang, Wen-Jun Wu, Fang-Xiang |
author_facet | Lin, Wenjun Wang, Jianxin Zhang, Wen-Jun Wu, Fang-Xiang |
author_sort | Lin, Wenjun |
collection | PubMed |
description | BACKGROUND: In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. However, majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. Therefore, the quality assessment (before database search) is very useful in the pipeline of protein identification via tandem mass spectra, especially on the reduction of searching time and the decrease of false identifications. Most existing methods for quality assessment are supervised machine learning methods based on a number of features which describe the quality of tandem mass spectra. These methods need the training datasets with knowing the quality of all spectra, which are usually unavailable for the new datasets. RESULTS: This study proposes an unsupervised machine learning method for quality assessment of tandem mass spectra without any training dataset. This proposed method estimates the conditional probabilities of spectra being high quality from the quality assessments based on individual features. The probabilities are estimated through a constraint optimization problem. An efficient algorithm is developed to solve the constraint optimization problem and is proved to be convergent. Experimental results on two datasets illustrate that if we search only tandem spectra with the high quality determined by the proposed method, we can save about 56 % and 62% of database searching time while losing only a small amount of high-quality spectra. CONCLUSIONS: Results indicate that the proposed method has a good performance for the quality assessment of tandem mass spectra and the way we estimate the conditional probabilities is effective. |
format | Online Article Text |
id | pubmed-3380733 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-33807332012-06-25 An unsupervised machine learning method for assessing quality of tandem mass spectra Lin, Wenjun Wang, Jianxin Zhang, Wen-Jun Wu, Fang-Xiang Proteome Sci Proceedings BACKGROUND: In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. However, majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. Therefore, the quality assessment (before database search) is very useful in the pipeline of protein identification via tandem mass spectra, especially on the reduction of searching time and the decrease of false identifications. Most existing methods for quality assessment are supervised machine learning methods based on a number of features which describe the quality of tandem mass spectra. These methods need the training datasets with knowing the quality of all spectra, which are usually unavailable for the new datasets. RESULTS: This study proposes an unsupervised machine learning method for quality assessment of tandem mass spectra without any training dataset. This proposed method estimates the conditional probabilities of spectra being high quality from the quality assessments based on individual features. The probabilities are estimated through a constraint optimization problem. An efficient algorithm is developed to solve the constraint optimization problem and is proved to be convergent. Experimental results on two datasets illustrate that if we search only tandem spectra with the high quality determined by the proposed method, we can save about 56 % and 62% of database searching time while losing only a small amount of high-quality spectra. CONCLUSIONS: Results indicate that the proposed method has a good performance for the quality assessment of tandem mass spectra and the way we estimate the conditional probabilities is effective. BioMed Central 2012-06-21 /pmc/articles/PMC3380733/ /pubmed/22759570 http://dx.doi.org/10.1186/1477-5956-10-S1-S12 Text en Copyright ©2012 Lin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Lin, Wenjun Wang, Jianxin Zhang, Wen-Jun Wu, Fang-Xiang An unsupervised machine learning method for assessing quality of tandem mass spectra |
title | An unsupervised machine learning method for assessing quality of tandem mass spectra |
title_full | An unsupervised machine learning method for assessing quality of tandem mass spectra |
title_fullStr | An unsupervised machine learning method for assessing quality of tandem mass spectra |
title_full_unstemmed | An unsupervised machine learning method for assessing quality of tandem mass spectra |
title_short | An unsupervised machine learning method for assessing quality of tandem mass spectra |
title_sort | unsupervised machine learning method for assessing quality of tandem mass spectra |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3380733/ https://www.ncbi.nlm.nih.gov/pubmed/22759570 http://dx.doi.org/10.1186/1477-5956-10-S1-S12 |
work_keys_str_mv | AT linwenjun anunsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra AT wangjianxin anunsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra AT zhangwenjun anunsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra AT wufangxiang anunsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra AT linwenjun unsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra AT wangjianxin unsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra AT zhangwenjun unsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra AT wufangxiang unsupervisedmachinelearningmethodforassessingqualityoftandemmassspectra |