Cargando…

A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package

Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent se...

Descripción completa

Detalles Bibliográficos
Autores principales: Motomura, Kenta, Nakamura, Morikazu, Otaki, Joji M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology (RNCSB) Organization 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3962227/
https://www.ncbi.nlm.nih.gov/pubmed/24688703
http://dx.doi.org/10.5936/csbj.201302010
_version_ 1782308403469615104
author Motomura, Kenta
Nakamura, Morikazu
Otaki, Joji M.
author_facet Motomura, Kenta
Nakamura, Morikazu
Otaki, Joji M.
author_sort Motomura, Kenta
collection PubMed
description Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs.
format Online
Article
Text
id pubmed-3962227
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Research Network of Computational and Structural Biotechnology (RNCSB) Organization
record_format MEDLINE/PubMed
spelling pubmed-39622272014-03-31 A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package Motomura, Kenta Nakamura, Morikazu Otaki, Joji M. Comput Struct Biotechnol J Mini Reviews Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs. Research Network of Computational and Structural Biotechnology (RNCSB) Organization 2013-03-29 /pmc/articles/PMC3962227/ /pubmed/24688703 http://dx.doi.org/10.5936/csbj.201302010 Text en © Motomura et al. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly cited.
spellingShingle Mini Reviews
Motomura, Kenta
Nakamura, Morikazu
Otaki, Joji M.
A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package
title A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package
title_full A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package
title_fullStr A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package
title_full_unstemmed A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package
title_short A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package
title_sort frequency-based linguistic approach to protein decoding and design: simple concepts, diverse applications, and the scs package
topic Mini Reviews
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3962227/
https://www.ncbi.nlm.nih.gov/pubmed/24688703
http://dx.doi.org/10.5936/csbj.201302010
work_keys_str_mv AT motomurakenta afrequencybasedlinguisticapproachtoproteindecodinganddesignsimpleconceptsdiverseapplicationsandthescspackage
AT nakamuramorikazu afrequencybasedlinguisticapproachtoproteindecodinganddesignsimpleconceptsdiverseapplicationsandthescspackage
AT otakijojim afrequencybasedlinguisticapproachtoproteindecodinganddesignsimpleconceptsdiverseapplicationsandthescspackage
AT motomurakenta frequencybasedlinguisticapproachtoproteindecodinganddesignsimpleconceptsdiverseapplicationsandthescspackage
AT nakamuramorikazu frequencybasedlinguisticapproachtoproteindecodinganddesignsimpleconceptsdiverseapplicationsandthescspackage
AT otakijojim frequencybasedlinguisticapproachtoproteindecodinganddesignsimpleconceptsdiverseapplicationsandthescspackage