Cargando…
Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences
Here, we describe a unique probabilistic evaluation of the 20, naturally occurring, amino acids and their distributions within the Swiss-Prot and Complete Human Genebank databases. We have developed a computational technique that imparts both directionality and length constraints into searches for u...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier B.V.
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7127678/ https://www.ncbi.nlm.nih.gov/pubmed/32288076 http://dx.doi.org/10.1016/j.physa.2006.03.004 |
_version_ | 1783516413033447424 |
---|---|
author | Shen, Shiyi Kai, Bo Ruan, Jishou Torin Huzil, J. Carpenter, Eric Tuszynski, Jack A. |
author_facet | Shen, Shiyi Kai, Bo Ruan, Jishou Torin Huzil, J. Carpenter, Eric Tuszynski, Jack A. |
author_sort | Shen, Shiyi |
collection | PubMed |
description | Here, we describe a unique probabilistic evaluation of the 20, naturally occurring, amino acids and their distributions within the Swiss-Prot and Complete Human Genebank databases. We have developed a computational technique that imparts both directionality and length constraints into searches for unique combinations of amino acids within protein sequences. Using statistical approaches, we have carried out searches of all possible two- and three-residue motifs contained within these databases. This technique is based on the unusually high occurrence of a small number of these motifs when compared to the expected probability of finding a specific residue grouping within a given database. Subsequent filtering of this search to identify such unique combinations has provided several examples that can be used as markers to identify particular proteins within or across databases. We focus on three of these motifs, which were found to be of greatest interest to us. The CC, CM and a combination of the two, CCM motifs all occur either more or less frequently than would be predicted based on standard amino acid distributions within the entire human proteome. |
format | Online Article Text |
id | pubmed-7127678 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | Elsevier B.V. |
record_format | MEDLINE/PubMed |
spelling | pubmed-71276782020-04-08 Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences Shen, Shiyi Kai, Bo Ruan, Jishou Torin Huzil, J. Carpenter, Eric Tuszynski, Jack A. Physica A Article Here, we describe a unique probabilistic evaluation of the 20, naturally occurring, amino acids and their distributions within the Swiss-Prot and Complete Human Genebank databases. We have developed a computational technique that imparts both directionality and length constraints into searches for unique combinations of amino acids within protein sequences. Using statistical approaches, we have carried out searches of all possible two- and three-residue motifs contained within these databases. This technique is based on the unusually high occurrence of a small number of these motifs when compared to the expected probability of finding a specific residue grouping within a given database. Subsequent filtering of this search to identify such unique combinations has provided several examples that can be used as markers to identify particular proteins within or across databases. We focus on three of these motifs, which were found to be of greatest interest to us. The CC, CM and a combination of the two, CCM motifs all occur either more or less frequently than would be predicted based on standard amino acid distributions within the entire human proteome. Elsevier B.V. 2006-10-15 2006-04-03 /pmc/articles/PMC7127678/ /pubmed/32288076 http://dx.doi.org/10.1016/j.physa.2006.03.004 Text en Copyright © 2006 Elsevier B.V. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Shen, Shiyi Kai, Bo Ruan, Jishou Torin Huzil, J. Carpenter, Eric Tuszynski, Jack A. Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences |
title | Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences |
title_full | Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences |
title_fullStr | Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences |
title_full_unstemmed | Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences |
title_short | Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences |
title_sort | probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7127678/ https://www.ncbi.nlm.nih.gov/pubmed/32288076 http://dx.doi.org/10.1016/j.physa.2006.03.004 |
work_keys_str_mv | AT shenshiyi probabilisticanalysisofthefrequenciesofaminoacidpairswithincharacterizedproteinsequences AT kaibo probabilisticanalysisofthefrequenciesofaminoacidpairswithincharacterizedproteinsequences AT ruanjishou probabilisticanalysisofthefrequenciesofaminoacidpairswithincharacterizedproteinsequences AT torinhuzilj probabilisticanalysisofthefrequenciesofaminoacidpairswithincharacterizedproteinsequences AT carpentereric probabilisticanalysisofthefrequenciesofaminoacidpairswithincharacterizedproteinsequences AT tuszynskijacka probabilisticanalysisofthefrequenciesofaminoacidpairswithincharacterizedproteinsequences |