Cargando…
A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package
Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent se...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology (RNCSB) Organization
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3962227/ https://www.ncbi.nlm.nih.gov/pubmed/24688703 http://dx.doi.org/10.5936/csbj.201302010 |
_version_ | 1782308403469615104 |
---|---|
author | Motomura, Kenta Nakamura, Morikazu Otaki, Joji M. |
author_facet | Motomura, Kenta Nakamura, Morikazu Otaki, Joji M. |
author_sort | Motomura, Kenta |
collection | PubMed |
description | Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs. |
format | Online Article Text |
id | pubmed-3962227 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Research Network of Computational and Structural Biotechnology (RNCSB) Organization |
record_format | MEDLINE/PubMed |
spelling | pubmed-39622272014-03-31 A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package Motomura, Kenta Nakamura, Morikazu Otaki, Joji M. Comput Struct Biotechnol J Mini Reviews Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs. Research Network of Computational and Structural Biotechnology (RNCSB) Organization 2013-03-29 /pmc/articles/PMC3962227/ /pubmed/24688703 http://dx.doi.org/10.5936/csbj.201302010 Text en © Motomura et al. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly cited. |
spellingShingle | Mini Reviews Motomura, Kenta Nakamura, Morikazu Otaki, Joji M. A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package |
title | A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package |
title_full | A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package |
title_fullStr | A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package |
title_full_unstemmed | A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package |
title_short | A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package |
title_sort | frequency-based linguistic approach to protein decoding and design: simple concepts, diverse applications, and the scs package |
topic | Mini Reviews |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3962227/ https://www.ncbi.nlm.nih.gov/pubmed/24688703 http://dx.doi.org/10.5936/csbj.201302010 |
work_keys_str_mv | AT motomurakenta afrequencybasedlinguisticapproachtoproteindecodinganddesignsimpleconceptsdiverseapplicationsandthescspackage AT nakamuramorikazu afrequencybasedlinguisticapproachtoproteindecodinganddesignsimpleconceptsdiverseapplicationsandthescspackage AT otakijojim afrequencybasedlinguisticapproachtoproteindecodinganddesignsimpleconceptsdiverseapplicationsandthescspackage AT motomurakenta frequencybasedlinguisticapproachtoproteindecodinganddesignsimpleconceptsdiverseapplicationsandthescspackage AT nakamuramorikazu frequencybasedlinguisticapproachtoproteindecodinganddesignsimpleconceptsdiverseapplicationsandthescspackage AT otakijojim frequencybasedlinguisticapproachtoproteindecodinganddesignsimpleconceptsdiverseapplicationsandthescspackage |