Cargando…

On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation

BACKGROUND: Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properti...

Descripción completa

Detalles Bibliográficos
Autores principales: Wong, Wing-Cheong, Maurer-Stroh, Sebastian, Eisenhaber, Birgit, Eisenhaber, Frank
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4061105/
https://www.ncbi.nlm.nih.gov/pubmed/24890864
http://dx.doi.org/10.1186/1471-2105-15-166
_version_ 1782321451908464640
author Wong, Wing-Cheong
Maurer-Stroh, Sebastian
Eisenhaber, Birgit
Eisenhaber, Frank
author_facet Wong, Wing-Cheong
Maurer-Stroh, Sebastian
Eisenhaber, Birgit
Eisenhaber, Frank
author_sort Wong, Wing-Cheong
collection PubMed
description BACKGROUND: Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. RESULTS: The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. CONCLUSIONS: Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formate/nitrite transporters that, so far, was only supported by structure comparison.
format Online
Article
Text
id pubmed-4061105
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40611052014-06-30 On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation Wong, Wing-Cheong Maurer-Stroh, Sebastian Eisenhaber, Birgit Eisenhaber, Frank BMC Bioinformatics Methodology Article BACKGROUND: Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. RESULTS: The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. CONCLUSIONS: Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formate/nitrite transporters that, so far, was only supported by structure comparison. BioMed Central 2014-06-02 /pmc/articles/PMC4061105/ /pubmed/24890864 http://dx.doi.org/10.1186/1471-2105-15-166 Text en Copyright © 2014 Wong et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Wong, Wing-Cheong
Maurer-Stroh, Sebastian
Eisenhaber, Birgit
Eisenhaber, Frank
On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
title On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
title_full On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
title_fullStr On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
title_full_unstemmed On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
title_short On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
title_sort on the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4061105/
https://www.ncbi.nlm.nih.gov/pubmed/24890864
http://dx.doi.org/10.1186/1471-2105-15-166
work_keys_str_mv AT wongwingcheong onthenecessityofdissectingsequencesimilarityscoresintosegmentspecificcontributionsforinferringproteinhomologyfunctionpredictionandannotation
AT maurerstrohsebastian onthenecessityofdissectingsequencesimilarityscoresintosegmentspecificcontributionsforinferringproteinhomologyfunctionpredictionandannotation
AT eisenhaberbirgit onthenecessityofdissectingsequencesimilarityscoresintosegmentspecificcontributionsforinferringproteinhomologyfunctionpredictionandannotation
AT eisenhaberfrank onthenecessityofdissectingsequencesimilarityscoresintosegmentspecificcontributionsforinferringproteinhomologyfunctionpredictionandannotation