Cargando…

Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering

As increasingly more genomes are sequenced, the vast majority of proteins may only be annotated computationally, given experimental investigation is extremely costly. This highlights the need for computational methods to determine protein functions quickly and reliably. We believe dividing a protein...

Descripción completa

Detalles Bibliográficos
Autores principales: Boari de Lima, Elisa, Meira, Wagner, de Melo-Minardi, Raquel Cardoso
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4922564/
https://www.ncbi.nlm.nih.gov/pubmed/27348631
http://dx.doi.org/10.1371/journal.pcbi.1005001
_version_ 1782439629886062592
author Boari de Lima, Elisa
Meira, Wagner
de Melo-Minardi, Raquel Cardoso
author_facet Boari de Lima, Elisa
Meira, Wagner
de Melo-Minardi, Raquel Cardoso
author_sort Boari de Lima, Elisa
collection PubMed
description As increasingly more genomes are sequenced, the vast majority of proteins may only be annotated computationally, given experimental investigation is extremely costly. This highlights the need for computational methods to determine protein functions quickly and reliably. We believe dividing a protein family into subtypes which share specific functions uncommon to the whole family reduces the function annotation problem’s complexity. Hence, this work’s purpose is to detect isofunctional subfamilies inside a family of unknown function, while identifying differentiating residues. Similarity between protein pairs according to various properties is interpreted as functional similarity evidence. Data are integrated using genetic programming and provided to a spectral clustering algorithm, which creates clusters of similar proteins. The proposed framework was applied to well-known protein families and to a family of unknown function, then compared to ASMC. Results showed our fully automated technique obtained better clusters than ASMC for two families, besides equivalent results for other two, including one whose clusters were manually defined. Clusters produced by our framework showed great correspondence with the known subfamilies, besides being more contrasting than those produced by ASMC. Additionally, for the families whose specificity determining positions are known, such residues were among those our technique considered most important to differentiate a given group. When run with the crotonase and enolase SFLD superfamilies, the results showed great agreement with this gold-standard. Best results consistently involved multiple data types, thus confirming our hypothesis that similarities according to different knowledge domains may be used as functional similarity evidence. Our main contributions are the proposed strategy for selecting and integrating data types, along with the ability to work with noisy and incomplete data; domain knowledge usage for detecting subfamilies in a family with different specificities, thus reducing the complexity of the experimental function characterization problem; and the identification of residues responsible for specificity.
format Online
Article
Text
id pubmed-4922564
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-49225642016-07-18 Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering Boari de Lima, Elisa Meira, Wagner de Melo-Minardi, Raquel Cardoso PLoS Comput Biol Research Article As increasingly more genomes are sequenced, the vast majority of proteins may only be annotated computationally, given experimental investigation is extremely costly. This highlights the need for computational methods to determine protein functions quickly and reliably. We believe dividing a protein family into subtypes which share specific functions uncommon to the whole family reduces the function annotation problem’s complexity. Hence, this work’s purpose is to detect isofunctional subfamilies inside a family of unknown function, while identifying differentiating residues. Similarity between protein pairs according to various properties is interpreted as functional similarity evidence. Data are integrated using genetic programming and provided to a spectral clustering algorithm, which creates clusters of similar proteins. The proposed framework was applied to well-known protein families and to a family of unknown function, then compared to ASMC. Results showed our fully automated technique obtained better clusters than ASMC for two families, besides equivalent results for other two, including one whose clusters were manually defined. Clusters produced by our framework showed great correspondence with the known subfamilies, besides being more contrasting than those produced by ASMC. Additionally, for the families whose specificity determining positions are known, such residues were among those our technique considered most important to differentiate a given group. When run with the crotonase and enolase SFLD superfamilies, the results showed great agreement with this gold-standard. Best results consistently involved multiple data types, thus confirming our hypothesis that similarities according to different knowledge domains may be used as functional similarity evidence. Our main contributions are the proposed strategy for selecting and integrating data types, along with the ability to work with noisy and incomplete data; domain knowledge usage for detecting subfamilies in a family with different specificities, thus reducing the complexity of the experimental function characterization problem; and the identification of residues responsible for specificity. Public Library of Science 2016-06-27 /pmc/articles/PMC4922564/ /pubmed/27348631 http://dx.doi.org/10.1371/journal.pcbi.1005001 Text en © 2016 Boari de Lima et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Boari de Lima, Elisa
Meira, Wagner
de Melo-Minardi, Raquel Cardoso
Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering
title Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering
title_full Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering
title_fullStr Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering
title_full_unstemmed Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering
title_short Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering
title_sort isofunctional protein subfamily detection using data integration and spectral clustering
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4922564/
https://www.ncbi.nlm.nih.gov/pubmed/27348631
http://dx.doi.org/10.1371/journal.pcbi.1005001
work_keys_str_mv AT boaridelimaelisa isofunctionalproteinsubfamilydetectionusingdataintegrationandspectralclustering
AT meirawagner isofunctionalproteinsubfamilydetectionusingdataintegrationandspectralclustering
AT demelominardiraquelcardoso isofunctionalproteinsubfamilydetectionusingdataintegrationandspectralclustering