Cargando…

Revisiting amino acid substitution matrices for identifying distantly related proteins

Motivation: Although many amino acid substitution matrices have been developed, it has not been well understood which is the best for similarity searches, especially for remote homology detection. Therefore, we collected information related to existing matrices, condensed it and derived a novel matr...

Descripción completa

Detalles Bibliográficos
Autores principales: Yamada, Kazunori, Tomii, Kentaro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3904525/
https://www.ncbi.nlm.nih.gov/pubmed/24281694
http://dx.doi.org/10.1093/bioinformatics/btt694
_version_ 1782301220692557824
author Yamada, Kazunori
Tomii, Kentaro
author_facet Yamada, Kazunori
Tomii, Kentaro
author_sort Yamada, Kazunori
collection PubMed
description Motivation: Although many amino acid substitution matrices have been developed, it has not been well understood which is the best for similarity searches, especially for remote homology detection. Therefore, we collected information related to existing matrices, condensed it and derived a novel matrix that can detect more remote homology than ever. Results: Using principal component analysis with existing matrices and benchmarks, we developed a novel matrix, which we designate as MIQS. The detection performance of MIQS is validated and compared with that of existing general purpose matrices using SSEARCH with optimized gap penalties for each matrix. Results show that MIQS is able to detect more remote homology than the existing matrices on an independent dataset. In addition, the performance of our developed matrix was superior to that of CS-BLAST, which was a novel similarity search method with no amino acid matrix. We also evaluated the alignment quality of matrices and methods, which revealed that MIQS shows higher alignment sensitivity than that with the existing matrix series and CS-BLAST. Fundamentally, these results are expected to constitute good proof of the availability and/or importance of amino acid matrices in sequence analysis. Moreover, with our developed matrix, sophisticated similarity search methods such as sequence–profile and profile–profile comparison methods can be improved further. Availability and implementation: Newly developed matrices and datasets used for this study are available at http://csas.cbrc.jp/Ssearch/. Contact: k-tomii@aist.go.jp Supplementary information: Supplementary data are available at Bioinformatics online
format Online
Article
Text
id pubmed-3904525
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-39045252014-01-28 Revisiting amino acid substitution matrices for identifying distantly related proteins Yamada, Kazunori Tomii, Kentaro Bioinformatics Original Papers Motivation: Although many amino acid substitution matrices have been developed, it has not been well understood which is the best for similarity searches, especially for remote homology detection. Therefore, we collected information related to existing matrices, condensed it and derived a novel matrix that can detect more remote homology than ever. Results: Using principal component analysis with existing matrices and benchmarks, we developed a novel matrix, which we designate as MIQS. The detection performance of MIQS is validated and compared with that of existing general purpose matrices using SSEARCH with optimized gap penalties for each matrix. Results show that MIQS is able to detect more remote homology than the existing matrices on an independent dataset. In addition, the performance of our developed matrix was superior to that of CS-BLAST, which was a novel similarity search method with no amino acid matrix. We also evaluated the alignment quality of matrices and methods, which revealed that MIQS shows higher alignment sensitivity than that with the existing matrix series and CS-BLAST. Fundamentally, these results are expected to constitute good proof of the availability and/or importance of amino acid matrices in sequence analysis. Moreover, with our developed matrix, sophisticated similarity search methods such as sequence–profile and profile–profile comparison methods can be improved further. Availability and implementation: Newly developed matrices and datasets used for this study are available at http://csas.cbrc.jp/Ssearch/. Contact: k-tomii@aist.go.jp Supplementary information: Supplementary data are available at Bioinformatics online Oxford University Press 2014-02-01 2013-11-26 /pmc/articles/PMC3904525/ /pubmed/24281694 http://dx.doi.org/10.1093/bioinformatics/btt694 Text en © The Author 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Yamada, Kazunori
Tomii, Kentaro
Revisiting amino acid substitution matrices for identifying distantly related proteins
title Revisiting amino acid substitution matrices for identifying distantly related proteins
title_full Revisiting amino acid substitution matrices for identifying distantly related proteins
title_fullStr Revisiting amino acid substitution matrices for identifying distantly related proteins
title_full_unstemmed Revisiting amino acid substitution matrices for identifying distantly related proteins
title_short Revisiting amino acid substitution matrices for identifying distantly related proteins
title_sort revisiting amino acid substitution matrices for identifying distantly related proteins
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3904525/
https://www.ncbi.nlm.nih.gov/pubmed/24281694
http://dx.doi.org/10.1093/bioinformatics/btt694
work_keys_str_mv AT yamadakazunori revisitingaminoacidsubstitutionmatricesforidentifyingdistantlyrelatedproteins
AT tomiikentaro revisitingaminoacidsubstitutionmatricesforidentifyingdistantlyrelatedproteins