Cargando…

Physicochemical property distributions for accurate and rapid pairwise protein homology detection

BACKGROUND: The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Webb-Robertson, Bobbie-Jo M, Ratuiste, Kyle G, Oehmen, Christopher S
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2851606/
https://www.ncbi.nlm.nih.gov/pubmed/20302613
http://dx.doi.org/10.1186/1471-2105-11-145
_version_ 1782179882300604416
author Webb-Robertson, Bobbie-Jo M
Ratuiste, Kyle G
Oehmen, Christopher S
author_facet Webb-Robertson, Bobbie-Jo M
Ratuiste, Kyle G
Oehmen, Christopher S
author_sort Webb-Robertson, Bobbie-Jo M
collection PubMed
description BACKGROUND: The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection. RESULTS: We introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost. CONCLUSIONS: A protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics.
format Text
id pubmed-2851606
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28516062010-04-09 Physicochemical property distributions for accurate and rapid pairwise protein homology detection Webb-Robertson, Bobbie-Jo M Ratuiste, Kyle G Oehmen, Christopher S BMC Bioinformatics Methodology article BACKGROUND: The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection. RESULTS: We introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost. CONCLUSIONS: A protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics. BioMed Central 2010-03-19 /pmc/articles/PMC2851606/ /pubmed/20302613 http://dx.doi.org/10.1186/1471-2105-11-145 Text en Copyright ©2010 Webb-Robertson et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology article
Webb-Robertson, Bobbie-Jo M
Ratuiste, Kyle G
Oehmen, Christopher S
Physicochemical property distributions for accurate and rapid pairwise protein homology detection
title Physicochemical property distributions for accurate and rapid pairwise protein homology detection
title_full Physicochemical property distributions for accurate and rapid pairwise protein homology detection
title_fullStr Physicochemical property distributions for accurate and rapid pairwise protein homology detection
title_full_unstemmed Physicochemical property distributions for accurate and rapid pairwise protein homology detection
title_short Physicochemical property distributions for accurate and rapid pairwise protein homology detection
title_sort physicochemical property distributions for accurate and rapid pairwise protein homology detection
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2851606/
https://www.ncbi.nlm.nih.gov/pubmed/20302613
http://dx.doi.org/10.1186/1471-2105-11-145
work_keys_str_mv AT webbrobertsonbobbiejom physicochemicalpropertydistributionsforaccurateandrapidpairwiseproteinhomologydetection
AT ratuistekyleg physicochemicalpropertydistributionsforaccurateandrapidpairwiseproteinhomologydetection
AT oehmenchristophers physicochemicalpropertydistributionsforaccurateandrapidpairwiseproteinhomologydetection