Cargando…

3D representations of amino acids—applications to protein sequence comparison and classification

The amino acid sequence of a protein is the key to understanding its structure and ultimately its function in the cell. This paper addresses the fundamental issue of encoding amino acids in ways that the representation of such a protein sequence facilitates the decoding of its information content. W...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Jie, Koehl, Patrice
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4212284/
https://www.ncbi.nlm.nih.gov/pubmed/25379143
http://dx.doi.org/10.1016/j.csbj.2014.09.001
_version_ 1782341683968475136
author Li, Jie
Koehl, Patrice
author_facet Li, Jie
Koehl, Patrice
author_sort Li, Jie
collection PubMed
description The amino acid sequence of a protein is the key to understanding its structure and ultimately its function in the cell. This paper addresses the fundamental issue of encoding amino acids in ways that the representation of such a protein sequence facilitates the decoding of its information content. We show that a feature-based representation in a three-dimensional (3D) space derived from amino acid substitution matrices provides an adequate representation that can be used for direct comparison of protein sequences based on geometry. We measure the performance of such a representation in the context of the protein structural fold prediction problem. We compare the results of classifying different sets of proteins belonging to distinct structural folds against classifications of the same proteins obtained from sequence alone or directly from structural information. We find that sequence alone performs poorly as a structure classifier. We show in contrast that the use of the three dimensional representation of the sequences significantly improves the classification accuracy. We conclude with a discussion of the current limitations of such a representation and with a description of potential improvements.
format Online
Article
Text
id pubmed-4212284
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-42122842014-11-06 3D representations of amino acids—applications to protein sequence comparison and classification Li, Jie Koehl, Patrice Comput Struct Biotechnol J Article The amino acid sequence of a protein is the key to understanding its structure and ultimately its function in the cell. This paper addresses the fundamental issue of encoding amino acids in ways that the representation of such a protein sequence facilitates the decoding of its information content. We show that a feature-based representation in a three-dimensional (3D) space derived from amino acid substitution matrices provides an adequate representation that can be used for direct comparison of protein sequences based on geometry. We measure the performance of such a representation in the context of the protein structural fold prediction problem. We compare the results of classifying different sets of proteins belonging to distinct structural folds against classifications of the same proteins obtained from sequence alone or directly from structural information. We find that sequence alone performs poorly as a structure classifier. We show in contrast that the use of the three dimensional representation of the sequences significantly improves the classification accuracy. We conclude with a discussion of the current limitations of such a representation and with a description of potential improvements. Research Network of Computational and Structural Biotechnology 2014-09-06 /pmc/articles/PMC4212284/ /pubmed/25379143 http://dx.doi.org/10.1016/j.csbj.2014.09.001 Text en © 2014 Li and Koehl. Published by Elsevier B.V. on behalf of the Research Network of Computational and Structural Biotechnology.
spellingShingle Article
Li, Jie
Koehl, Patrice
3D representations of amino acids—applications to protein sequence comparison and classification
title 3D representations of amino acids—applications to protein sequence comparison and classification
title_full 3D representations of amino acids—applications to protein sequence comparison and classification
title_fullStr 3D representations of amino acids—applications to protein sequence comparison and classification
title_full_unstemmed 3D representations of amino acids—applications to protein sequence comparison and classification
title_short 3D representations of amino acids—applications to protein sequence comparison and classification
title_sort 3d representations of amino acids—applications to protein sequence comparison and classification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4212284/
https://www.ncbi.nlm.nih.gov/pubmed/25379143
http://dx.doi.org/10.1016/j.csbj.2014.09.001
work_keys_str_mv AT lijie 3drepresentationsofaminoacidsapplicationstoproteinsequencecomparisonandclassification
AT koehlpatrice 3drepresentationsofaminoacidsapplicationstoproteinsequencecomparisonandclassification