Cargando…
A kingdom-specific protein domain HMM library for improved annotation of fungal genomes
BACKGROUND: Pfam is a general-purpose database of protein domain alignments and profile Hidden Markov Models (HMMs), which is very popular for the annotation of sequence data produced by genome sequencing projects. Pfam provides models that are often very general in terms of the taxa that they cover...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1854895/ https://www.ncbi.nlm.nih.gov/pubmed/17425790 http://dx.doi.org/10.1186/1471-2164-8-97 |
_version_ | 1782133121513160704 |
---|---|
author | Alam, Intikhab Hubbard, Simon J Oliver, Stephen G Rattray, Magnus |
author_facet | Alam, Intikhab Hubbard, Simon J Oliver, Stephen G Rattray, Magnus |
author_sort | Alam, Intikhab |
collection | PubMed |
description | BACKGROUND: Pfam is a general-purpose database of protein domain alignments and profile Hidden Markov Models (HMMs), which is very popular for the annotation of sequence data produced by genome sequencing projects. Pfam provides models that are often very general in terms of the taxa that they cover and it has previously been suggested that such general models may lack some of the specificity or selectivity that would be provided by kingdom-specific models. RESULTS: Here we present a general approach to create domain libraries of HMMs for sub-taxa of a kingdom. Taking fungal species as an example, we construct a domain library of HMMs (called Fungal Pfam or FPfam) using sequences from 30 genomes, consisting of 24 species from the ascomycetes group and two basidiomycetes, Ustilago maydis, a fungal pathogen of maize, and the white rot fungus Phanerochaete chrysosporium. In addition, we include the Microsporidion Encephalitozoon cuniculi, an obligate intracellular parasite, and two non-fungal species, the oomycetes Phytophthora sojae and Phytophthora ramorum, both plant pathogens. We evaluate the performance in terms of coverage against the original 30 genomes used in training FPfam and against five more recently sequenced fungal genomes that can be considered as an independent test set. We show that kingdom-specific models such as FPfam can find instances of both novel and well characterized domains, increases overall coverage and detects more domains per sequence with typically higher bitscores than Pfam for the same domain families. An evaluation of the effect of changing E-values on the coverage shows that the performance of FPfam is consistent over the range of E-values applied. CONCLUSION: Kingdom-specific models are shown to provide improved coverage. However, as the models become more specific, some sequences found by Pfam may be missed by the models in FPfam and some of the families represented in the test set are not present in FPfam. Therefore, we recommend that both general and specific libraries are used together for annotation and we find that a significant improvement in coverage is achieved by using both Pfam and FPfam. |
format | Text |
id | pubmed-1854895 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-18548952007-04-21 A kingdom-specific protein domain HMM library for improved annotation of fungal genomes Alam, Intikhab Hubbard, Simon J Oliver, Stephen G Rattray, Magnus BMC Genomics Research Article BACKGROUND: Pfam is a general-purpose database of protein domain alignments and profile Hidden Markov Models (HMMs), which is very popular for the annotation of sequence data produced by genome sequencing projects. Pfam provides models that are often very general in terms of the taxa that they cover and it has previously been suggested that such general models may lack some of the specificity or selectivity that would be provided by kingdom-specific models. RESULTS: Here we present a general approach to create domain libraries of HMMs for sub-taxa of a kingdom. Taking fungal species as an example, we construct a domain library of HMMs (called Fungal Pfam or FPfam) using sequences from 30 genomes, consisting of 24 species from the ascomycetes group and two basidiomycetes, Ustilago maydis, a fungal pathogen of maize, and the white rot fungus Phanerochaete chrysosporium. In addition, we include the Microsporidion Encephalitozoon cuniculi, an obligate intracellular parasite, and two non-fungal species, the oomycetes Phytophthora sojae and Phytophthora ramorum, both plant pathogens. We evaluate the performance in terms of coverage against the original 30 genomes used in training FPfam and against five more recently sequenced fungal genomes that can be considered as an independent test set. We show that kingdom-specific models such as FPfam can find instances of both novel and well characterized domains, increases overall coverage and detects more domains per sequence with typically higher bitscores than Pfam for the same domain families. An evaluation of the effect of changing E-values on the coverage shows that the performance of FPfam is consistent over the range of E-values applied. CONCLUSION: Kingdom-specific models are shown to provide improved coverage. However, as the models become more specific, some sequences found by Pfam may be missed by the models in FPfam and some of the families represented in the test set are not present in FPfam. Therefore, we recommend that both general and specific libraries are used together for annotation and we find that a significant improvement in coverage is achieved by using both Pfam and FPfam. BioMed Central 2007-04-10 /pmc/articles/PMC1854895/ /pubmed/17425790 http://dx.doi.org/10.1186/1471-2164-8-97 Text en Copyright © 2007 Alam et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Alam, Intikhab Hubbard, Simon J Oliver, Stephen G Rattray, Magnus A kingdom-specific protein domain HMM library for improved annotation of fungal genomes |
title | A kingdom-specific protein domain HMM library for improved annotation of fungal genomes |
title_full | A kingdom-specific protein domain HMM library for improved annotation of fungal genomes |
title_fullStr | A kingdom-specific protein domain HMM library for improved annotation of fungal genomes |
title_full_unstemmed | A kingdom-specific protein domain HMM library for improved annotation of fungal genomes |
title_short | A kingdom-specific protein domain HMM library for improved annotation of fungal genomes |
title_sort | kingdom-specific protein domain hmm library for improved annotation of fungal genomes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1854895/ https://www.ncbi.nlm.nih.gov/pubmed/17425790 http://dx.doi.org/10.1186/1471-2164-8-97 |
work_keys_str_mv | AT alamintikhab akingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes AT hubbardsimonj akingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes AT oliverstepheng akingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes AT rattraymagnus akingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes AT alamintikhab kingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes AT hubbardsimonj kingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes AT oliverstepheng kingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes AT rattraymagnus kingdomspecificproteindomainhmmlibraryforimprovedannotationoffungalgenomes |