Cargando…

Initial Cluster Analysis

We study a simple abstract problem motivated by a variety of applications in protein sequence analysis. Consider a string of 0s and 1s of length L, and containing D 1s. If we believe that some or all of the 1s may be clustered near the start of the sequence, which subset is the most significantly so...

Descripción completa

Detalles Bibliográficos
Autores principales: Altschul, Stephen F., Neuwald, Andrew F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Mary Ann Liebert, Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5806593/
https://www.ncbi.nlm.nih.gov/pubmed/28771374
http://dx.doi.org/10.1089/cmb.2017.0050
_version_ 1783299155531137024
author Altschul, Stephen F.
Neuwald, Andrew F.
author_facet Altschul, Stephen F.
Neuwald, Andrew F.
author_sort Altschul, Stephen F.
collection PubMed
description We study a simple abstract problem motivated by a variety of applications in protein sequence analysis. Consider a string of 0s and 1s of length L, and containing D 1s. If we believe that some or all of the 1s may be clustered near the start of the sequence, which subset is the most significantly so clustered, and how significant is this clustering? We approach this question using the minimum description length principle and illustrate its application by analyzing residues that distinguish translational initiation and elongation factor guanosine triphosphatases (GTPases) from other P-loop GTPases. Within a structure of yeast elongation factor 1 [Formula: see text] , these residues form a significant cluster centered on a region implicated in guanine nucleotide exchange. Various biomedical questions may be cast as the abstract problem considered here.
format Online
Article
Text
id pubmed-5806593
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Mary Ann Liebert, Inc.
record_format MEDLINE/PubMed
spelling pubmed-58065932018-02-12 Initial Cluster Analysis Altschul, Stephen F. Neuwald, Andrew F. J Comput Biol Research Articles We study a simple abstract problem motivated by a variety of applications in protein sequence analysis. Consider a string of 0s and 1s of length L, and containing D 1s. If we believe that some or all of the 1s may be clustered near the start of the sequence, which subset is the most significantly so clustered, and how significant is this clustering? We approach this question using the minimum description length principle and illustrate its application by analyzing residues that distinguish translational initiation and elongation factor guanosine triphosphatases (GTPases) from other P-loop GTPases. Within a structure of yeast elongation factor 1 [Formula: see text] , these residues form a significant cluster centered on a region implicated in guanine nucleotide exchange. Various biomedical questions may be cast as the abstract problem considered here. Mary Ann Liebert, Inc. 2018-02-01 2018-02-01 /pmc/articles/PMC5806593/ /pubmed/28771374 http://dx.doi.org/10.1089/cmb.2017.0050 Text en © Stephen F. Altschul and Andrew F. Neuwald, 2017. Published by Mary Ann Liebert, Inc. This Open Access article is distributed under the terms of the Creative Commons License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research Articles
Altschul, Stephen F.
Neuwald, Andrew F.
Initial Cluster Analysis
title Initial Cluster Analysis
title_full Initial Cluster Analysis
title_fullStr Initial Cluster Analysis
title_full_unstemmed Initial Cluster Analysis
title_short Initial Cluster Analysis
title_sort initial cluster analysis
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5806593/
https://www.ncbi.nlm.nih.gov/pubmed/28771374
http://dx.doi.org/10.1089/cmb.2017.0050
work_keys_str_mv AT altschulstephenf initialclusteranalysis
AT neuwaldandrewf initialclusteranalysis