Cargando…
Functional coherence metrics in protein families
BACKGROUND: Biological sequences, such as proteins, have been provided with annotations that assign functional information. These functional annotations are associations of proteins (or other biological sequences) with descriptors characterizing their biological roles. However, not all proteins are...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4917928/ https://www.ncbi.nlm.nih.gov/pubmed/27338101 http://dx.doi.org/10.1186/s13326-016-0076-y |
_version_ | 1782439023875194880 |
---|---|
author | Bastos, Hugo P. Sousa, Lisete Clarke, Luka A. Couto, Francisco M. |
author_facet | Bastos, Hugo P. Sousa, Lisete Clarke, Luka A. Couto, Francisco M. |
author_sort | Bastos, Hugo P. |
collection | PubMed |
description | BACKGROUND: Biological sequences, such as proteins, have been provided with annotations that assign functional information. These functional annotations are associations of proteins (or other biological sequences) with descriptors characterizing their biological roles. However, not all proteins are fully (or even at all) annotated. This annotation incompleteness limits our ability to make sound assertions about the functional coherence within sets of proteins. Annotation incompleteness is a problematic issue when measuring semantic functional similarity of biological sequences since they can only capture a limited amount of all the semantic aspects the sequences may encompass. METHODS: Instead of relying uniquely on single (reductive) metrics, this work proposes a comprehensive approach for assessing functional coherence within protein sets. The approach entails using visualization and term enrichment techniques anchored in specific domain knowledge, such as a protein family. For that purpose we evaluate two novel functional coherence metrics, mUI and mGIC that combine aspects of semantic similarity measures and term enrichment. RESULTS: These metrics were used to effectively capture and measure the local similarity cores within protein sets. Hence, these metrics coupled with visualization tools allow an improved grasp on three important functional annotation aspects: completeness, agreement and coherence. CONCLUSIONS: Measuring the functional similarity between proteins based on their annotations is a non trivial task. Several metrics exist but due both to characteristics intrinsic to the nature of graphs and extrinsic natures related to the process of annotation each measure can only capture certain functional annotation aspects of proteins. Hence, when trying to measure the functional coherence of a set of proteins a single metric is too reductive. Therefore, it is valuable to be aware of how each employed similarity metric works and what similarity aspects it can best capture. Here we test the behaviour and resilience of some similarity metrics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13326-016-0076-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4917928 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-49179282016-06-24 Functional coherence metrics in protein families Bastos, Hugo P. Sousa, Lisete Clarke, Luka A. Couto, Francisco M. J Biomed Semantics Research BACKGROUND: Biological sequences, such as proteins, have been provided with annotations that assign functional information. These functional annotations are associations of proteins (or other biological sequences) with descriptors characterizing their biological roles. However, not all proteins are fully (or even at all) annotated. This annotation incompleteness limits our ability to make sound assertions about the functional coherence within sets of proteins. Annotation incompleteness is a problematic issue when measuring semantic functional similarity of biological sequences since they can only capture a limited amount of all the semantic aspects the sequences may encompass. METHODS: Instead of relying uniquely on single (reductive) metrics, this work proposes a comprehensive approach for assessing functional coherence within protein sets. The approach entails using visualization and term enrichment techniques anchored in specific domain knowledge, such as a protein family. For that purpose we evaluate two novel functional coherence metrics, mUI and mGIC that combine aspects of semantic similarity measures and term enrichment. RESULTS: These metrics were used to effectively capture and measure the local similarity cores within protein sets. Hence, these metrics coupled with visualization tools allow an improved grasp on three important functional annotation aspects: completeness, agreement and coherence. CONCLUSIONS: Measuring the functional similarity between proteins based on their annotations is a non trivial task. Several metrics exist but due both to characteristics intrinsic to the nature of graphs and extrinsic natures related to the process of annotation each measure can only capture certain functional annotation aspects of proteins. Hence, when trying to measure the functional coherence of a set of proteins a single metric is too reductive. Therefore, it is valuable to be aware of how each employed similarity metric works and what similarity aspects it can best capture. Here we test the behaviour and resilience of some similarity metrics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13326-016-0076-y) contains supplementary material, which is available to authorized users. BioMed Central 2016-06-23 /pmc/articles/PMC4917928/ /pubmed/27338101 http://dx.doi.org/10.1186/s13326-016-0076-y Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Bastos, Hugo P. Sousa, Lisete Clarke, Luka A. Couto, Francisco M. Functional coherence metrics in protein families |
title | Functional coherence metrics in protein families |
title_full | Functional coherence metrics in protein families |
title_fullStr | Functional coherence metrics in protein families |
title_full_unstemmed | Functional coherence metrics in protein families |
title_short | Functional coherence metrics in protein families |
title_sort | functional coherence metrics in protein families |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4917928/ https://www.ncbi.nlm.nih.gov/pubmed/27338101 http://dx.doi.org/10.1186/s13326-016-0076-y |
work_keys_str_mv | AT bastoshugop functionalcoherencemetricsinproteinfamilies AT sousalisete functionalcoherencemetricsinproteinfamilies AT clarkelukaa functionalcoherencemetricsinproteinfamilies AT coutofranciscom functionalcoherencemetricsinproteinfamilies |