Cargando…

Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix

Protein database search for public databases is a fundamental step in the target selection of proteins in structural and functional genomics and also for inferring protein structure, function, and evolution. Most database search methods employ amino acid substitution matrices to score amino acid pai...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lim, Kyungtaek, Yamada, Kazunori D., Frith, Martin C., Tomii, Kentaro
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Netherlands 2017
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5274646/ https://www.ncbi.nlm.nih.gov/pubmed/28083762 http://dx.doi.org/10.1007/s10969-016-9210-4

_version_	1782501937992695808
author	Lim, Kyungtaek Yamada, Kazunori D. Frith, Martin C. Tomii, Kentaro
author_facet	Lim, Kyungtaek Yamada, Kazunori D. Frith, Martin C. Tomii, Kentaro
author_sort	Lim, Kyungtaek
collection	PubMed
description	Protein database search for public databases is a fundamental step in the target selection of proteins in structural and functional genomics and also for inferring protein structure, function, and evolution. Most database search methods employ amino acid substitution matrices to score amino acid pairs. The choice of substitution matrix strongly affects homology detection performance. We earlier proposed a substitution matrix named MIQS that was optimized for distant protein homology search. Herein we further evaluate MIQS in combination with LAST, a heuristic and fast database search tool with a tunable sensitivity parameter m, where larger m denotes higher sensitivity. Results show that MIQS substantially improves the homology detection and alignment quality performance of LAST across diverse m parameters. Against a protein database consisting of approximately 15 million sequences, LAST with m = 10(5) achieves better homology detection performance than BLASTP, and completes the search 20 times faster. Compared to the most sensitive existing methods being used today, CS-BLAST and SSEARCH, LAST with MIQS and m = 10(6) shows comparable homology detection performance at 2.0 and 3.9 times greater speed, respectively. Results demonstrate that MIQS-powered LAST is a time-efficient method for sensitive and accurate homology search. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s10969-016-9210-4) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5274646
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Springer Netherlands
record_format	MEDLINE/PubMed
spelling	pubmed-52746462017-02-10 Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix Lim, Kyungtaek Yamada, Kazunori D. Frith, Martin C. Tomii, Kentaro J Struct Funct Genomics Article Protein database search for public databases is a fundamental step in the target selection of proteins in structural and functional genomics and also for inferring protein structure, function, and evolution. Most database search methods employ amino acid substitution matrices to score amino acid pairs. The choice of substitution matrix strongly affects homology detection performance. We earlier proposed a substitution matrix named MIQS that was optimized for distant protein homology search. Herein we further evaluate MIQS in combination with LAST, a heuristic and fast database search tool with a tunable sensitivity parameter m, where larger m denotes higher sensitivity. Results show that MIQS substantially improves the homology detection and alignment quality performance of LAST across diverse m parameters. Against a protein database consisting of approximately 15 million sequences, LAST with m = 10(5) achieves better homology detection performance than BLASTP, and completes the search 20 times faster. Compared to the most sensitive existing methods being used today, CS-BLAST and SSEARCH, LAST with MIQS and m = 10(6) shows comparable homology detection performance at 2.0 and 3.9 times greater speed, respectively. Results demonstrate that MIQS-powered LAST is a time-efficient method for sensitive and accurate homology search. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s10969-016-9210-4) contains supplementary material, which is available to authorized users. Springer Netherlands 2017-01-12 2016 /pmc/articles/PMC5274646/ /pubmed/28083762 http://dx.doi.org/10.1007/s10969-016-9210-4 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Article Lim, Kyungtaek Yamada, Kazunori D. Frith, Martin C. Tomii, Kentaro Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix
title	Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix
title_full	Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix
title_fullStr	Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix
title_full_unstemmed	Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix
title_short	Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix
title_sort	protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5274646/ https://www.ncbi.nlm.nih.gov/pubmed/28083762 http://dx.doi.org/10.1007/s10969-016-9210-4
work_keys_str_mv	AT limkyungtaek proteinsequencesimilaritysearchaccelerationusingaheuristicalgorithmwithasensitivematrix AT yamadakazunorid proteinsequencesimilaritysearchaccelerationusingaheuristicalgorithmwithasensitivematrix AT frithmartinc proteinsequencesimilaritysearchaccelerationusingaheuristicalgorithmwithasensitivematrix AT tomiikentaro proteinsequencesimilaritysearchaccelerationusingaheuristicalgorithmwithasensitivematrix

Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix

Ejemplares similares