Cargando…

Amino acid "little Big Bang": Representing amino acid substitution matrices as dot products of Euclidian vectors

BACKGROUND: Sequence comparisons make use of a one-letter representation for amino acids, the necessary quantitative information being supplied by the substitution matrices. This paper deals with the problem of finding a representation that provides a comprehensive description of amino acid intrinsi...

Descripción completa

Detalles Bibliográficos
Autores principales: Zimmermann, Karel, Gibrat, Jean-François
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098074/
https://www.ncbi.nlm.nih.gov/pubmed/20047649
http://dx.doi.org/10.1186/1471-2105-11-4
_version_ 1782203911424180224
author Zimmermann, Karel
Gibrat, Jean-François
author_facet Zimmermann, Karel
Gibrat, Jean-François
author_sort Zimmermann, Karel
collection PubMed
description BACKGROUND: Sequence comparisons make use of a one-letter representation for amino acids, the necessary quantitative information being supplied by the substitution matrices. This paper deals with the problem of finding a representation that provides a comprehensive description of amino acid intrinsic properties consistent with the substitution matrices. RESULTS: We present a Euclidian vector representation of the amino acids, obtained by the singular value decomposition of the substitution matrices. The substitution matrix entries correspond to the dot product of amino acid vectors. We apply this vector encoding to the study of the relative importance of various amino acid physicochemical properties upon the substitution matrices. We also characterize and compare the PAM and BLOSUM series substitution matrices. CONCLUSIONS: This vector encoding introduces a Euclidian metric in the amino acid space, consistent with substitution matrices. Such a numerical description of the amino acid is useful when intrinsic properties of amino acids are necessary, for instance, building sequence profiles or finding consensus sequences, using machine learning algorithms such as Support Vector Machine and Neural Networks algorithms.
format Text
id pubmed-3098074
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30980742011-05-20 Amino acid "little Big Bang": Representing amino acid substitution matrices as dot products of Euclidian vectors Zimmermann, Karel Gibrat, Jean-François BMC Bioinformatics Research Article BACKGROUND: Sequence comparisons make use of a one-letter representation for amino acids, the necessary quantitative information being supplied by the substitution matrices. This paper deals with the problem of finding a representation that provides a comprehensive description of amino acid intrinsic properties consistent with the substitution matrices. RESULTS: We present a Euclidian vector representation of the amino acids, obtained by the singular value decomposition of the substitution matrices. The substitution matrix entries correspond to the dot product of amino acid vectors. We apply this vector encoding to the study of the relative importance of various amino acid physicochemical properties upon the substitution matrices. We also characterize and compare the PAM and BLOSUM series substitution matrices. CONCLUSIONS: This vector encoding introduces a Euclidian metric in the amino acid space, consistent with substitution matrices. Such a numerical description of the amino acid is useful when intrinsic properties of amino acids are necessary, for instance, building sequence profiles or finding consensus sequences, using machine learning algorithms such as Support Vector Machine and Neural Networks algorithms. BioMed Central 2010-01-04 /pmc/articles/PMC3098074/ /pubmed/20047649 http://dx.doi.org/10.1186/1471-2105-11-4 Text en Copyright ©2010 Zimmermann and Gibrat; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zimmermann, Karel
Gibrat, Jean-François
Amino acid "little Big Bang": Representing amino acid substitution matrices as dot products of Euclidian vectors
title Amino acid "little Big Bang": Representing amino acid substitution matrices as dot products of Euclidian vectors
title_full Amino acid "little Big Bang": Representing amino acid substitution matrices as dot products of Euclidian vectors
title_fullStr Amino acid "little Big Bang": Representing amino acid substitution matrices as dot products of Euclidian vectors
title_full_unstemmed Amino acid "little Big Bang": Representing amino acid substitution matrices as dot products of Euclidian vectors
title_short Amino acid "little Big Bang": Representing amino acid substitution matrices as dot products of Euclidian vectors
title_sort amino acid "little big bang": representing amino acid substitution matrices as dot products of euclidian vectors
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098074/
https://www.ncbi.nlm.nih.gov/pubmed/20047649
http://dx.doi.org/10.1186/1471-2105-11-4
work_keys_str_mv AT zimmermannkarel aminoacidlittlebigbangrepresentingaminoacidsubstitutionmatricesasdotproductsofeuclidianvectors
AT gibratjeanfrancois aminoacidlittlebigbangrepresentingaminoacidsubstitutionmatricesasdotproductsofeuclidianvectors