Cargando…
An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences
Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to ident...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5368075/ https://www.ncbi.nlm.nih.gov/pubmed/28054422 http://dx.doi.org/10.1002/pro.3112 |
_version_ | 1782517855199166464 |
---|---|
author | Knutson, Stacy T. Westwood, Brian M. Leuthaeuser, Janelle B. Turner, Brandon E. Nguyendac, Don Shea, Gabrielle Kumar, Kiran Hayden, Julia D. Harper, Angela F. Brown, Shoshana D. Morris, John H. Ferrin, Thomas E. Babbitt, Patricia C. Fetrow, Jacquelyn S. |
author_facet | Knutson, Stacy T. Westwood, Brian M. Leuthaeuser, Janelle B. Turner, Brandon E. Nguyendac, Don Shea, Gabrielle Kumar, Kiran Hayden, Julia D. Harper, Angela F. Brown, Shoshana D. Morris, John H. Ferrin, Thomas E. Babbitt, Patricia C. Fetrow, Jacquelyn S. |
author_sort | Knutson, Stacy T. |
collection | PubMed |
description | Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. |
format | Online Article Text |
id | pubmed-5368075 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-53680752017-03-29 An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences Knutson, Stacy T. Westwood, Brian M. Leuthaeuser, Janelle B. Turner, Brandon E. Nguyendac, Don Shea, Gabrielle Kumar, Kiran Hayden, Julia D. Harper, Angela F. Brown, Shoshana D. Morris, John H. Ferrin, Thomas E. Babbitt, Patricia C. Fetrow, Jacquelyn S. Protein Sci Articles Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. John Wiley and Sons Inc. 2017-03-08 2017-04 /pmc/articles/PMC5368075/ /pubmed/28054422 http://dx.doi.org/10.1002/pro.3112 Text en © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial (http://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. |
spellingShingle | Articles Knutson, Stacy T. Westwood, Brian M. Leuthaeuser, Janelle B. Turner, Brandon E. Nguyendac, Don Shea, Gabrielle Kumar, Kiran Hayden, Julia D. Harper, Angela F. Brown, Shoshana D. Morris, John H. Ferrin, Thomas E. Babbitt, Patricia C. Fetrow, Jacquelyn S. An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences |
title | An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences |
title_full | An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences |
title_fullStr | An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences |
title_full_unstemmed | An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences |
title_short | An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences |
title_sort | approach to functionally relevant clustering of the protein universe: active site profile‐based clustering of protein structures and sequences |
topic | Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5368075/ https://www.ncbi.nlm.nih.gov/pubmed/28054422 http://dx.doi.org/10.1002/pro.3112 |
work_keys_str_mv | AT knutsonstacyt anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT westwoodbrianm anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT leuthaeuserjanelleb anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT turnerbrandone anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT nguyendacdon anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT sheagabrielle anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT kumarkiran anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT haydenjuliad anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT harperangelaf anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT brownshoshanad anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT morrisjohnh anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT ferrinthomase anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT babbittpatriciac anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT fetrowjacquelyns anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT knutsonstacyt approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT westwoodbrianm approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT leuthaeuserjanelleb approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT turnerbrandone approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT nguyendacdon approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT sheagabrielle approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT kumarkiran approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT haydenjuliad approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT harperangelaf approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT brownshoshanad approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT morrisjohnh approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT ferrinthomase approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT babbittpatriciac approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences AT fetrowjacquelyns approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences |