Cargando…

An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences

Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to ident...

Descripción completa

Detalles Bibliográficos
Autores principales: Knutson, Stacy T., Westwood, Brian M., Leuthaeuser, Janelle B., Turner, Brandon E., Nguyendac, Don, Shea, Gabrielle, Kumar, Kiran, Hayden, Julia D., Harper, Angela F., Brown, Shoshana D., Morris, John H., Ferrin, Thomas E., Babbitt, Patricia C., Fetrow, Jacquelyn S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5368075/
https://www.ncbi.nlm.nih.gov/pubmed/28054422
http://dx.doi.org/10.1002/pro.3112
_version_ 1782517855199166464
author Knutson, Stacy T.
Westwood, Brian M.
Leuthaeuser, Janelle B.
Turner, Brandon E.
Nguyendac, Don
Shea, Gabrielle
Kumar, Kiran
Hayden, Julia D.
Harper, Angela F.
Brown, Shoshana D.
Morris, John H.
Ferrin, Thomas E.
Babbitt, Patricia C.
Fetrow, Jacquelyn S.
author_facet Knutson, Stacy T.
Westwood, Brian M.
Leuthaeuser, Janelle B.
Turner, Brandon E.
Nguyendac, Don
Shea, Gabrielle
Kumar, Kiran
Hayden, Julia D.
Harper, Angela F.
Brown, Shoshana D.
Morris, John H.
Ferrin, Thomas E.
Babbitt, Patricia C.
Fetrow, Jacquelyn S.
author_sort Knutson, Stacy T.
collection PubMed
description Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results.
format Online
Article
Text
id pubmed-5368075
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-53680752017-03-29 An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences Knutson, Stacy T. Westwood, Brian M. Leuthaeuser, Janelle B. Turner, Brandon E. Nguyendac, Don Shea, Gabrielle Kumar, Kiran Hayden, Julia D. Harper, Angela F. Brown, Shoshana D. Morris, John H. Ferrin, Thomas E. Babbitt, Patricia C. Fetrow, Jacquelyn S. Protein Sci Articles Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. John Wiley and Sons Inc. 2017-03-08 2017-04 /pmc/articles/PMC5368075/ /pubmed/28054422 http://dx.doi.org/10.1002/pro.3112 Text en © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial (http://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Articles
Knutson, Stacy T.
Westwood, Brian M.
Leuthaeuser, Janelle B.
Turner, Brandon E.
Nguyendac, Don
Shea, Gabrielle
Kumar, Kiran
Hayden, Julia D.
Harper, Angela F.
Brown, Shoshana D.
Morris, John H.
Ferrin, Thomas E.
Babbitt, Patricia C.
Fetrow, Jacquelyn S.
An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences
title An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences
title_full An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences
title_fullStr An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences
title_full_unstemmed An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences
title_short An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences
title_sort approach to functionally relevant clustering of the protein universe: active site profile‐based clustering of protein structures and sequences
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5368075/
https://www.ncbi.nlm.nih.gov/pubmed/28054422
http://dx.doi.org/10.1002/pro.3112
work_keys_str_mv AT knutsonstacyt anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT westwoodbrianm anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT leuthaeuserjanelleb anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT turnerbrandone anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT nguyendacdon anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT sheagabrielle anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT kumarkiran anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT haydenjuliad anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT harperangelaf anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT brownshoshanad anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT morrisjohnh anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT ferrinthomase anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT babbittpatriciac anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT fetrowjacquelyns anapproachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT knutsonstacyt approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT westwoodbrianm approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT leuthaeuserjanelleb approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT turnerbrandone approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT nguyendacdon approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT sheagabrielle approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT kumarkiran approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT haydenjuliad approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT harperangelaf approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT brownshoshanad approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT morrisjohnh approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT ferrinthomase approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT babbittpatriciac approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences
AT fetrowjacquelyns approachtofunctionallyrelevantclusteringoftheproteinuniverseactivesiteprofilebasedclusteringofproteinstructuresandsequences