Cargando…

Speeding up tandem mass spectrometry-based database searching by longest common prefix

BACKGROUND: Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or n...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Chen, Chi, Hao, Wang, Le-Heng, Li, You, Wu, Yan-Jie, Fu, Yan, Sun, Rui-Xiang, He, Si-Min
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3000425/
https://www.ncbi.nlm.nih.gov/pubmed/21108792
http://dx.doi.org/10.1186/1471-2105-11-577
_version_ 1782193543890075648
author Zhou, Chen
Chi, Hao
Wang, Le-Heng
Li, You
Wu, Yan-Jie
Fu, Yan
Sun, Rui-Xiang
He, Si-Min
author_facet Zhou, Chen
Chi, Hao
Wang, Le-Heng
Li, You
Wu, Yan-Jie
Fu, Yan
Sun, Rui-Xiang
He, Si-Min
author_sort Zhou, Chen
collection PubMed
description BACKGROUND: Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use. RESULTS: We developed an algorithm based on the longest common prefix (ABLCP) to efficiently organize a protein sequence database. The longest common prefix is a data structure that is always coupled to the suffix array. It eliminates redundant candidate peptides in databases and reduces the corresponding peptide-spectrum matching times, thereby decreasing the identification time. This algorithm is based on the property of the longest common prefix. Even enzymatic digestion poses a challenge to this property, but some adjustments can be made to this algorithm to ensure that no candidate peptides are omitted. Compared with peptide indexing, ABLCP requires much less time and space for construction and is subject to fewer restrictions. CONCLUSIONS: The ABLCP algorithm can help to improve data analysis efficiency. A software tool implementing this algorithm is available at http://pfind.ict.ac.cn/pfind2dot5/index.htm
format Text
id pubmed-3000425
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30004252010-12-15 Speeding up tandem mass spectrometry-based database searching by longest common prefix Zhou, Chen Chi, Hao Wang, Le-Heng Li, You Wu, Yan-Jie Fu, Yan Sun, Rui-Xiang He, Si-Min BMC Bioinformatics Research Article BACKGROUND: Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use. RESULTS: We developed an algorithm based on the longest common prefix (ABLCP) to efficiently organize a protein sequence database. The longest common prefix is a data structure that is always coupled to the suffix array. It eliminates redundant candidate peptides in databases and reduces the corresponding peptide-spectrum matching times, thereby decreasing the identification time. This algorithm is based on the property of the longest common prefix. Even enzymatic digestion poses a challenge to this property, but some adjustments can be made to this algorithm to ensure that no candidate peptides are omitted. Compared with peptide indexing, ABLCP requires much less time and space for construction and is subject to fewer restrictions. CONCLUSIONS: The ABLCP algorithm can help to improve data analysis efficiency. A software tool implementing this algorithm is available at http://pfind.ict.ac.cn/pfind2dot5/index.htm BioMed Central 2010-11-25 /pmc/articles/PMC3000425/ /pubmed/21108792 http://dx.doi.org/10.1186/1471-2105-11-577 Text en Copyright ©2010 Zhou et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zhou, Chen
Chi, Hao
Wang, Le-Heng
Li, You
Wu, Yan-Jie
Fu, Yan
Sun, Rui-Xiang
He, Si-Min
Speeding up tandem mass spectrometry-based database searching by longest common prefix
title Speeding up tandem mass spectrometry-based database searching by longest common prefix
title_full Speeding up tandem mass spectrometry-based database searching by longest common prefix
title_fullStr Speeding up tandem mass spectrometry-based database searching by longest common prefix
title_full_unstemmed Speeding up tandem mass spectrometry-based database searching by longest common prefix
title_short Speeding up tandem mass spectrometry-based database searching by longest common prefix
title_sort speeding up tandem mass spectrometry-based database searching by longest common prefix
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3000425/
https://www.ncbi.nlm.nih.gov/pubmed/21108792
http://dx.doi.org/10.1186/1471-2105-11-577
work_keys_str_mv AT zhouchen speedinguptandemmassspectrometrybaseddatabasesearchingbylongestcommonprefix
AT chihao speedinguptandemmassspectrometrybaseddatabasesearchingbylongestcommonprefix
AT wangleheng speedinguptandemmassspectrometrybaseddatabasesearchingbylongestcommonprefix
AT liyou speedinguptandemmassspectrometrybaseddatabasesearchingbylongestcommonprefix
AT wuyanjie speedinguptandemmassspectrometrybaseddatabasesearchingbylongestcommonprefix
AT fuyan speedinguptandemmassspectrometrybaseddatabasesearchingbylongestcommonprefix
AT sunruixiang speedinguptandemmassspectrometrybaseddatabasesearchingbylongestcommonprefix
AT hesimin speedinguptandemmassspectrometrybaseddatabasesearchingbylongestcommonprefix