Cargando…

Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences

Here, we describe a unique probabilistic evaluation of the 20, naturally occurring, amino acids and their distributions within the Swiss-Prot and Complete Human Genebank databases. We have developed a computational technique that imparts both directionality and length constraints into searches for u...

Descripción completa

Detalles Bibliográficos
Autores principales: Shen, Shiyi, Kai, Bo, Ruan, Jishou, Torin Huzil, J., Carpenter, Eric, Tuszynski, Jack A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier B.V. 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7127678/
https://www.ncbi.nlm.nih.gov/pubmed/32288076
http://dx.doi.org/10.1016/j.physa.2006.03.004
_version_ 1783516413033447424
author Shen, Shiyi
Kai, Bo
Ruan, Jishou
Torin Huzil, J.
Carpenter, Eric
Tuszynski, Jack A.
author_facet Shen, Shiyi
Kai, Bo
Ruan, Jishou
Torin Huzil, J.
Carpenter, Eric
Tuszynski, Jack A.
author_sort Shen, Shiyi
collection PubMed
description Here, we describe a unique probabilistic evaluation of the 20, naturally occurring, amino acids and their distributions within the Swiss-Prot and Complete Human Genebank databases. We have developed a computational technique that imparts both directionality and length constraints into searches for unique combinations of amino acids within protein sequences. Using statistical approaches, we have carried out searches of all possible two- and three-residue motifs contained within these databases. This technique is based on the unusually high occurrence of a small number of these motifs when compared to the expected probability of finding a specific residue grouping within a given database. Subsequent filtering of this search to identify such unique combinations has provided several examples that can be used as markers to identify particular proteins within or across databases. We focus on three of these motifs, which were found to be of greatest interest to us. The CC, CM and a combination of the two, CCM motifs all occur either more or less frequently than would be predicted based on standard amino acid distributions within the entire human proteome.
format Online
Article
Text
id pubmed-7127678
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher Elsevier B.V.
record_format MEDLINE/PubMed
spelling pubmed-71276782020-04-08 Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences Shen, Shiyi Kai, Bo Ruan, Jishou Torin Huzil, J. Carpenter, Eric Tuszynski, Jack A. Physica A Article Here, we describe a unique probabilistic evaluation of the 20, naturally occurring, amino acids and their distributions within the Swiss-Prot and Complete Human Genebank databases. We have developed a computational technique that imparts both directionality and length constraints into searches for unique combinations of amino acids within protein sequences. Using statistical approaches, we have carried out searches of all possible two- and three-residue motifs contained within these databases. This technique is based on the unusually high occurrence of a small number of these motifs when compared to the expected probability of finding a specific residue grouping within a given database. Subsequent filtering of this search to identify such unique combinations has provided several examples that can be used as markers to identify particular proteins within or across databases. We focus on three of these motifs, which were found to be of greatest interest to us. The CC, CM and a combination of the two, CCM motifs all occur either more or less frequently than would be predicted based on standard amino acid distributions within the entire human proteome. Elsevier B.V. 2006-10-15 2006-04-03 /pmc/articles/PMC7127678/ /pubmed/32288076 http://dx.doi.org/10.1016/j.physa.2006.03.004 Text en Copyright © 2006 Elsevier B.V. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Shen, Shiyi
Kai, Bo
Ruan, Jishou
Torin Huzil, J.
Carpenter, Eric
Tuszynski, Jack A.
Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences
title Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences
title_full Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences
title_fullStr Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences
title_full_unstemmed Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences
title_short Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences
title_sort probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7127678/
https://www.ncbi.nlm.nih.gov/pubmed/32288076
http://dx.doi.org/10.1016/j.physa.2006.03.004
work_keys_str_mv AT shenshiyi probabilisticanalysisofthefrequenciesofaminoacidpairswithincharacterizedproteinsequences
AT kaibo probabilisticanalysisofthefrequenciesofaminoacidpairswithincharacterizedproteinsequences
AT ruanjishou probabilisticanalysisofthefrequenciesofaminoacidpairswithincharacterizedproteinsequences
AT torinhuzilj probabilisticanalysisofthefrequenciesofaminoacidpairswithincharacterizedproteinsequences
AT carpentereric probabilisticanalysisofthefrequenciesofaminoacidpairswithincharacterizedproteinsequences
AT tuszynskijacka probabilisticanalysisofthefrequenciesofaminoacidpairswithincharacterizedproteinsequences