Cargando…
Some remarks on protein attribute prediction and pseudo amino acid composition
With the accomplishment of human genome sequencing, the number of sequence-known proteins has increased explosively. In contrast, the pace is much slower in determining their biological attributes. As a consequence, the gap between sequence-known proteins and attribute-known proteins has become incr...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier Ltd.
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7125570/ https://www.ncbi.nlm.nih.gov/pubmed/21168420 http://dx.doi.org/10.1016/j.jtbi.2010.12.024 |
_version_ | 1783515973335121920 |
---|---|
author | Chou, Kuo-Chen |
author_facet | Chou, Kuo-Chen |
author_sort | Chou, Kuo-Chen |
collection | PubMed |
description | With the accomplishment of human genome sequencing, the number of sequence-known proteins has increased explosively. In contrast, the pace is much slower in determining their biological attributes. As a consequence, the gap between sequence-known proteins and attribute-known proteins has become increasingly large. The unbalanced situation, which has critically limited our ability to timely utilize the newly discovered proteins for basic research and drug development, has called for developing computational methods or high-throughput automated tools for fast and reliably identifying various attributes of uncharacterized proteins based on their sequence information alone. Actually, during the last two decades or so, many methods in this regard have been established in hope to bridge such a gap. In the course of developing these methods, the following things were often needed to consider: (1) benchmark dataset construction, (2) protein sample formulation, (3) operating algorithm (or engine), (4) anticipated accuracy, and (5) web-server establishment. In this review, we are to discuss each of the five procedures, with a special focus on the introduction of pseudo amino acid composition (PseAAC), its different modes and applications as well as its recent development, particularly in how to use the general formulation of PseAAC to reflect the core and essential features that are deeply hidden in complicated protein sequences. |
format | Online Article Text |
id | pubmed-7125570 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Elsevier Ltd. |
record_format | MEDLINE/PubMed |
spelling | pubmed-71255702020-04-06 Some remarks on protein attribute prediction and pseudo amino acid composition Chou, Kuo-Chen J Theor Biol Article With the accomplishment of human genome sequencing, the number of sequence-known proteins has increased explosively. In contrast, the pace is much slower in determining their biological attributes. As a consequence, the gap between sequence-known proteins and attribute-known proteins has become increasingly large. The unbalanced situation, which has critically limited our ability to timely utilize the newly discovered proteins for basic research and drug development, has called for developing computational methods or high-throughput automated tools for fast and reliably identifying various attributes of uncharacterized proteins based on their sequence information alone. Actually, during the last two decades or so, many methods in this regard have been established in hope to bridge such a gap. In the course of developing these methods, the following things were often needed to consider: (1) benchmark dataset construction, (2) protein sample formulation, (3) operating algorithm (or engine), (4) anticipated accuracy, and (5) web-server establishment. In this review, we are to discuss each of the five procedures, with a special focus on the introduction of pseudo amino acid composition (PseAAC), its different modes and applications as well as its recent development, particularly in how to use the general formulation of PseAAC to reflect the core and essential features that are deeply hidden in complicated protein sequences. Elsevier Ltd. 2011-03-21 2010-12-17 /pmc/articles/PMC7125570/ /pubmed/21168420 http://dx.doi.org/10.1016/j.jtbi.2010.12.024 Text en Copyright © 2010 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Chou, Kuo-Chen Some remarks on protein attribute prediction and pseudo amino acid composition |
title | Some remarks on protein attribute prediction and pseudo amino acid composition |
title_full | Some remarks on protein attribute prediction and pseudo amino acid composition |
title_fullStr | Some remarks on protein attribute prediction and pseudo amino acid composition |
title_full_unstemmed | Some remarks on protein attribute prediction and pseudo amino acid composition |
title_short | Some remarks on protein attribute prediction and pseudo amino acid composition |
title_sort | some remarks on protein attribute prediction and pseudo amino acid composition |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7125570/ https://www.ncbi.nlm.nih.gov/pubmed/21168420 http://dx.doi.org/10.1016/j.jtbi.2010.12.024 |
work_keys_str_mv | AT choukuochen someremarksonproteinattributepredictionandpseudoaminoacidcomposition |