Cargando…

Predicting functional sites with an automated algorithm suitable for heterogeneous datasets

BACKGROUND: In a previous report (La et al., Proteins, 2005), we have demonstrated that the identification of phylogenetic motifs, protein sequence fragments conserving the overall familial phylogeny, represent a promising approach for sequence/function annotation. Across a structurally and function...

Descripción completa

Detalles Bibliográficos
Autores principales: La, David, Livesay, Dennis R
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1142304/
https://www.ncbi.nlm.nih.gov/pubmed/15890082
http://dx.doi.org/10.1186/1471-2105-6-116
_version_ 1782124271046230016
author La, David
Livesay, Dennis R
author_facet La, David
Livesay, Dennis R
author_sort La, David
collection PubMed
description BACKGROUND: In a previous report (La et al., Proteins, 2005), we have demonstrated that the identification of phylogenetic motifs, protein sequence fragments conserving the overall familial phylogeny, represent a promising approach for sequence/function annotation. Across a structurally and functionally heterogeneous dataset, phylogenetic motifs have been demonstrated to correspond to a wide variety of functional site archetypes, including those defined by surface loops, active site clefts, and less exposed regions. However, in our original demonstration of the technique, phylogenetic motif identification is dependent upon a manually determined similarity threshold, prohibiting large-scale application of the technique. RESULTS: In this report, we present an algorithmic approach that determines thresholds without human subjectivity. The approach relies on significant raw data preprocessing to improve signal detection. Subsequently, Partition Around Medoids Clustering (PAMC) of the similarity scores assesses sequence fragments where functional annotation remains in question. The accuracy of the approach is confirmed through comparisons to our previous (manual) results and structural analyses. Triosephosphate isomerase and arginyl-tRNA synthetase are discussed as exemplar cases. A quantitative functional site prediction assessment algorithm indicates that the phylogenetic motif predictions, which require sequence information only, are nearly as good as those from evolutionary trace methods that do incorporate structure. CONCLUSION: The automated threshold detection algorithm has been incorporated into MINER, our web-based phylogenetic motif identification server. MINER is freely available on the web at . Pre-calculated functional site predictions of the COG database and an implementation of the threshold detection algorithm, in the R statistical language, can also be accessed at the website.
format Text
id pubmed-1142304
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-11423042005-06-03 Predicting functional sites with an automated algorithm suitable for heterogeneous datasets La, David Livesay, Dennis R BMC Bioinformatics Software BACKGROUND: In a previous report (La et al., Proteins, 2005), we have demonstrated that the identification of phylogenetic motifs, protein sequence fragments conserving the overall familial phylogeny, represent a promising approach for sequence/function annotation. Across a structurally and functionally heterogeneous dataset, phylogenetic motifs have been demonstrated to correspond to a wide variety of functional site archetypes, including those defined by surface loops, active site clefts, and less exposed regions. However, in our original demonstration of the technique, phylogenetic motif identification is dependent upon a manually determined similarity threshold, prohibiting large-scale application of the technique. RESULTS: In this report, we present an algorithmic approach that determines thresholds without human subjectivity. The approach relies on significant raw data preprocessing to improve signal detection. Subsequently, Partition Around Medoids Clustering (PAMC) of the similarity scores assesses sequence fragments where functional annotation remains in question. The accuracy of the approach is confirmed through comparisons to our previous (manual) results and structural analyses. Triosephosphate isomerase and arginyl-tRNA synthetase are discussed as exemplar cases. A quantitative functional site prediction assessment algorithm indicates that the phylogenetic motif predictions, which require sequence information only, are nearly as good as those from evolutionary trace methods that do incorporate structure. CONCLUSION: The automated threshold detection algorithm has been incorporated into MINER, our web-based phylogenetic motif identification server. MINER is freely available on the web at . Pre-calculated functional site predictions of the COG database and an implementation of the threshold detection algorithm, in the R statistical language, can also be accessed at the website. BioMed Central 2005-05-13 /pmc/articles/PMC1142304/ /pubmed/15890082 http://dx.doi.org/10.1186/1471-2105-6-116 Text en Copyright © 2005 La and Livesay; licensee BioMed Central Ltd.
spellingShingle Software
La, David
Livesay, Dennis R
Predicting functional sites with an automated algorithm suitable for heterogeneous datasets
title Predicting functional sites with an automated algorithm suitable for heterogeneous datasets
title_full Predicting functional sites with an automated algorithm suitable for heterogeneous datasets
title_fullStr Predicting functional sites with an automated algorithm suitable for heterogeneous datasets
title_full_unstemmed Predicting functional sites with an automated algorithm suitable for heterogeneous datasets
title_short Predicting functional sites with an automated algorithm suitable for heterogeneous datasets
title_sort predicting functional sites with an automated algorithm suitable for heterogeneous datasets
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1142304/
https://www.ncbi.nlm.nih.gov/pubmed/15890082
http://dx.doi.org/10.1186/1471-2105-6-116
work_keys_str_mv AT ladavid predictingfunctionalsiteswithanautomatedalgorithmsuitableforheterogeneousdatasets
AT livesaydennisr predictingfunctionalsiteswithanautomatedalgorithmsuitableforheterogeneousdatasets