Cargando…

A kingdom-specific protein domain HMM library for improved annotation of fungal genomes

BACKGROUND: Pfam is a general-purpose database of protein domain alignments and profile Hidden Markov Models (HMMs), which is very popular for the annotation of sequence data produced by genome sequencing projects. Pfam provides models that are often very general in terms of the taxa that they cover...

Descripción completa

Detalles Bibliográficos
Autores principales: Alam, Intikhab, Hubbard, Simon J, Oliver, Stephen G, Rattray, Magnus
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1854895/
https://www.ncbi.nlm.nih.gov/pubmed/17425790
http://dx.doi.org/10.1186/1471-2164-8-97
_version_ 1782133121513160704
author Alam, Intikhab
Hubbard, Simon J
Oliver, Stephen G
Rattray, Magnus
author_facet Alam, Intikhab
Hubbard, Simon J
Oliver, Stephen G
Rattray, Magnus
author_sort Alam, Intikhab
collection PubMed
description BACKGROUND: Pfam is a general-purpose database of protein domain alignments and profile Hidden Markov Models (HMMs), which is very popular for the annotation of sequence data produced by genome sequencing projects. Pfam provides models that are often very general in terms of the taxa that they cover and it has previously been suggested that such general models may lack some of the specificity or selectivity that would be provided by kingdom-specific models. RESULTS: Here we present a general approach to create domain libraries of HMMs for sub-taxa of a kingdom. Taking fungal species as an example, we construct a domain library of HMMs (called Fungal Pfam or FPfam) using sequences from 30 genomes, consisting of 24 species from the ascomycetes group and two basidiomycetes, Ustilago maydis, a fungal pathogen of maize, and the white rot fungus Phanerochaete chrysosporium. In addition, we include the Microsporidion Encephalitozoon cuniculi, an obligate intracellular parasite, and two non-fungal species, the oomycetes Phytophthora sojae and Phytophthora ramorum, both plant pathogens. We evaluate the performance in terms of coverage against the original 30 genomes used in training FPfam and against five more recently sequenced fungal genomes that can be considered as an independent test set. We show that kingdom-specific models such as FPfam can find instances of both novel and well characterized domains, increases overall coverage and detects more domains per sequence with typically higher bitscores than Pfam for the same domain families. An evaluation of the effect of changing E-values on the coverage shows that the performance of FPfam is consistent over the range of E-values applied. CONCLUSION: Kingdom-specific models are shown to provide improved coverage. However, as the models become more specific, some sequences found by Pfam may be missed by the models in FPfam and some of the families represented in the test set are not present in FPfam. Therefore, we recommend that both general and specific libraries are used together for annotation and we find that a significant improvement in coverage is achieved by using both Pfam and FPfam.
format Text
id pubmed-1854895
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18548952007-04-21 A kingdom-specific protein domain HMM library for improved annotation of fungal genomes Alam, Intikhab Hubbard, Simon J Oliver, Stephen G Rattray, Magnus BMC Genomics Research Article BACKGROUND: Pfam is a general-purpose database of protein domain alignments and profile Hidden Markov Models (HMMs), which is very popular for the annotation of sequence data produced by genome sequencing projects. Pfam provides models that are often very general in terms of the taxa that they cover and it has previously been suggested that such general models may lack some of the specificity or selectivity that would be provided by kingdom-specific models. RESULTS: Here we present a general approach to create domain libraries of HMMs for sub-taxa of a kingdom. Taking fungal species as an example, we construct a domain library of HMMs (called Fungal Pfam or FPfam) using sequences from 30 genomes, consisting of 24 species from the ascomycetes group and two basidiomycetes, Ustilago maydis, a fungal pathogen of maize, and the white rot fungus Phanerochaete chrysosporium. In addition, we include the Microsporidion Encephalitozoon cuniculi, an obligate intracellular parasite, and two non-fungal species, the oomycetes Phytophthora sojae and Phytophthora ramorum, both plant pathogens. We evaluate the performance in terms of coverage against the original 30 genomes used in training FPfam and against five more recently sequenced fungal genomes that can be considered as an independent test set. We show that kingdom-specific models such as FPfam can find instances of both novel and well characterized domains, increases overall coverage and detects more domains per sequence with typically higher bitscores than Pfam for the same domain families. An evaluation of the effect of changing E-values on the coverage shows that the performance of FPfam is consistent over the range of E-values applied. CONCLUSION: Kingdom-specific models are shown to provide improved coverage. However, as the models become more specific, some sequences found by Pfam may be missed by the models in FPfam and some of the families represented in the test set are not present in FPfam. Therefore, we recommend that both general and specific libraries are used together for annotation and we find that a significant improvement in coverage is achieved by using both Pfam and FPfam. BioMed Central 2007-04-10 /pmc/articles/PMC1854895/ /pubmed/17425790 http://dx.doi.org/10.1186/1471-2164-8-97 Text en Copyright © 2007 Alam et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Alam, Intikhab
Hubbard, Simon J
Oliver, Stephen G
Rattray, Magnus
A kingdom-specific protein domain HMM library for improved annotation of fungal genomes
title A kingdom-specific protein domain HMM library for improved annotation of fungal genomes
title_full A kingdom-specific protein domain HMM library for improved annotation of fungal genomes
title_fullStr A kingdom-specific protein domain HMM library for improved annotation of fungal genomes
title_full_unstemmed A kingdom-specific protein domain HMM library for improved annotation of fungal genomes
title_short A kingdom-specific protein domain HMM library for improved annotation of fungal genomes
title_sort kingdom-specific protein domain hmm library for improved annotation of fungal genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1854895/
https://www.ncbi.nlm.nih.gov/pubmed/17425790
http://dx.doi.org/10.1186/1471-2164-8-97
work_keys_str_mv AT alamintikhab akingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes
AT hubbardsimonj akingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes
AT oliverstepheng akingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes
AT rattraymagnus akingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes
AT alamintikhab kingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes
AT hubbardsimonj kingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes
AT oliverstepheng kingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes
AT rattraymagnus kingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes