Cargando…
Initial Cluster Analysis
We study a simple abstract problem motivated by a variety of applications in protein sequence analysis. Consider a string of 0s and 1s of length L, and containing D 1s. If we believe that some or all of the 1s may be clustered near the start of the sequence, which subset is the most significantly so...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Mary Ann Liebert, Inc.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5806593/ https://www.ncbi.nlm.nih.gov/pubmed/28771374 http://dx.doi.org/10.1089/cmb.2017.0050 |
_version_ | 1783299155531137024 |
---|---|
author | Altschul, Stephen F. Neuwald, Andrew F. |
author_facet | Altschul, Stephen F. Neuwald, Andrew F. |
author_sort | Altschul, Stephen F. |
collection | PubMed |
description | We study a simple abstract problem motivated by a variety of applications in protein sequence analysis. Consider a string of 0s and 1s of length L, and containing D 1s. If we believe that some or all of the 1s may be clustered near the start of the sequence, which subset is the most significantly so clustered, and how significant is this clustering? We approach this question using the minimum description length principle and illustrate its application by analyzing residues that distinguish translational initiation and elongation factor guanosine triphosphatases (GTPases) from other P-loop GTPases. Within a structure of yeast elongation factor 1 [Formula: see text] , these residues form a significant cluster centered on a region implicated in guanine nucleotide exchange. Various biomedical questions may be cast as the abstract problem considered here. |
format | Online Article Text |
id | pubmed-5806593 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Mary Ann Liebert, Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-58065932018-02-12 Initial Cluster Analysis Altschul, Stephen F. Neuwald, Andrew F. J Comput Biol Research Articles We study a simple abstract problem motivated by a variety of applications in protein sequence analysis. Consider a string of 0s and 1s of length L, and containing D 1s. If we believe that some or all of the 1s may be clustered near the start of the sequence, which subset is the most significantly so clustered, and how significant is this clustering? We approach this question using the minimum description length principle and illustrate its application by analyzing residues that distinguish translational initiation and elongation factor guanosine triphosphatases (GTPases) from other P-loop GTPases. Within a structure of yeast elongation factor 1 [Formula: see text] , these residues form a significant cluster centered on a region implicated in guanine nucleotide exchange. Various biomedical questions may be cast as the abstract problem considered here. Mary Ann Liebert, Inc. 2018-02-01 2018-02-01 /pmc/articles/PMC5806593/ /pubmed/28771374 http://dx.doi.org/10.1089/cmb.2017.0050 Text en © Stephen F. Altschul and Andrew F. Neuwald, 2017. Published by Mary Ann Liebert, Inc. This Open Access article is distributed under the terms of the Creative Commons License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. |
spellingShingle | Research Articles Altschul, Stephen F. Neuwald, Andrew F. Initial Cluster Analysis |
title | Initial Cluster Analysis |
title_full | Initial Cluster Analysis |
title_fullStr | Initial Cluster Analysis |
title_full_unstemmed | Initial Cluster Analysis |
title_short | Initial Cluster Analysis |
title_sort | initial cluster analysis |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5806593/ https://www.ncbi.nlm.nih.gov/pubmed/28771374 http://dx.doi.org/10.1089/cmb.2017.0050 |
work_keys_str_mv | AT altschulstephenf initialclusteranalysis AT neuwaldandrewf initialclusteranalysis |