Cargando…

Enhanced protein domain discovery using taxonomy

BACKGROUND: It is well known that different species have different protein domain repertoires, and indeed that some protein domains are kingdom specific. This information has not yet been incorporated into statistical methods for finding domains in sequences of amino acids. RESULTS: We show that by...

Descripción completa

Detalles Bibliográficos
Autores principales: Coin, Lachlan, Bateman, Alex, Durbin, Richard
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC434490/
https://www.ncbi.nlm.nih.gov/pubmed/15137915
http://dx.doi.org/10.1186/1471-2105-5-56
_version_ 1782121516041764864
author Coin, Lachlan
Bateman, Alex
Durbin, Richard
author_facet Coin, Lachlan
Bateman, Alex
Durbin, Richard
author_sort Coin, Lachlan
collection PubMed
description BACKGROUND: It is well known that different species have different protein domain repertoires, and indeed that some protein domains are kingdom specific. This information has not yet been incorporated into statistical methods for finding domains in sequences of amino acids. RESULTS: We show that by incorporating our understanding of the taxonomic distribution of specific protein domains, we can enhance domain recognition in protein sequences. We identify 4447 new instances of Pfam domains in the SP-TREMBL database using this technique, equivalent to the coverage increase given by the last 8.3% of Pfam families and to a 0.7% increase in the number of domain predictions. We use PSI-BLAST to cross-validate our new predictions. We also benchmark our approach using a SCOP test set of proteins of known structure, and demonstrate improvements relative to standard Hidden Markov model techniques. CONCLUSIONS: Explicitly including knowledge about the taxonomic distribution of protein domains can enhance protein domain recognition. Our method can also incorporate other context-specific domain distributions – such as domain co-occurrence and protein localisation.
format Text
id pubmed-434490
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-4344902004-06-25 Enhanced protein domain discovery using taxonomy Coin, Lachlan Bateman, Alex Durbin, Richard BMC Bioinformatics Research Article BACKGROUND: It is well known that different species have different protein domain repertoires, and indeed that some protein domains are kingdom specific. This information has not yet been incorporated into statistical methods for finding domains in sequences of amino acids. RESULTS: We show that by incorporating our understanding of the taxonomic distribution of specific protein domains, we can enhance domain recognition in protein sequences. We identify 4447 new instances of Pfam domains in the SP-TREMBL database using this technique, equivalent to the coverage increase given by the last 8.3% of Pfam families and to a 0.7% increase in the number of domain predictions. We use PSI-BLAST to cross-validate our new predictions. We also benchmark our approach using a SCOP test set of proteins of known structure, and demonstrate improvements relative to standard Hidden Markov model techniques. CONCLUSIONS: Explicitly including knowledge about the taxonomic distribution of protein domains can enhance protein domain recognition. Our method can also incorporate other context-specific domain distributions – such as domain co-occurrence and protein localisation. BioMed Central 2004-05-11 /pmc/articles/PMC434490/ /pubmed/15137915 http://dx.doi.org/10.1186/1471-2105-5-56 Text en Copyright © 2004 Coin et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research Article
Coin, Lachlan
Bateman, Alex
Durbin, Richard
Enhanced protein domain discovery using taxonomy
title Enhanced protein domain discovery using taxonomy
title_full Enhanced protein domain discovery using taxonomy
title_fullStr Enhanced protein domain discovery using taxonomy
title_full_unstemmed Enhanced protein domain discovery using taxonomy
title_short Enhanced protein domain discovery using taxonomy
title_sort enhanced protein domain discovery using taxonomy
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC434490/
https://www.ncbi.nlm.nih.gov/pubmed/15137915
http://dx.doi.org/10.1186/1471-2105-5-56
work_keys_str_mv AT coinlachlan enhancedproteindomaindiscoveryusingtaxonomy
AT batemanalex enhancedproteindomaindiscoveryusingtaxonomy
AT durbinrichard enhancedproteindomaindiscoveryusingtaxonomy