Cargando…

Subdivision of the MDR superfamily of medium-chain dehydrogenases/reductases through iterative hidden Markov model refinement

BACKGROUND: The Medium-chain Dehydrogenases/Reductases (MDR) form a protein superfamily whose size and complexity defeats traditional means of subclassification; it currently has over 15000 members in the databases, the pairwise sequence identity is typically around 25%, there are members from all k...

Descripción completa

Detalles Bibliográficos
Autores principales: Hedlund, Joel, Jörnvall, Hans, Persson, Bengt
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2976758/
https://www.ncbi.nlm.nih.gov/pubmed/20979641
http://dx.doi.org/10.1186/1471-2105-11-534
_version_ 1782191010560868352
author Hedlund, Joel
Jörnvall, Hans
Persson, Bengt
author_facet Hedlund, Joel
Jörnvall, Hans
Persson, Bengt
author_sort Hedlund, Joel
collection PubMed
description BACKGROUND: The Medium-chain Dehydrogenases/Reductases (MDR) form a protein superfamily whose size and complexity defeats traditional means of subclassification; it currently has over 15000 members in the databases, the pairwise sequence identity is typically around 25%, there are members from all kingdoms of life, the chain-lengths vary as does the oligomericity, and the members are partaking in a multitude of biological processes. There are profile hidden Markov models (HMMs) available for detecting MDR superfamily members, but none for determining which MDR family each protein belongs to. The current torrential influx of new sequence data enables elucidation of more and more protein families, and at an increasingly fine granularity. However, gathering good quality training data usually requires manual attention by experts and has therefore been the rate limiting step for expanding the number of available models. RESULTS: We have developed an automated algorithm for HMM refinement that produces stable and reliable models for protein families. This algorithm uses relationships found in data to generate confident seed sets. Using this algorithm we have produced HMMs for 86 distinct MDR families and 34 of their subfamilies which can be used in automated annotation of new sequences. We find that MDR forms with 2 Zn(2+ )ions in general are dehydrogenases, while MDR forms with no Zn(2+ )in general are reductases. Furthermore, in Bacteria MDRs without Zn(2+ )are more frequent than those with Zn(2+), while the opposite is true for eukaryotic MDRs, indicating that Zn(2+ )has been recruited into the MDR superfamily after the initial life kingdom separations. We have also developed a web site http://mdr-enzymes.org that provides textual and numeric search against various characterised MDR family properties, as well as sequence scan functions for reliable classification of novel MDR sequences. CONCLUSIONS: Our method of refinement can be readily applied to create stable and reliable HMMs for both MDR and other protein families, and to confidently subdivide large and complex protein superfamilies. HMMs created using this algorithm correspond to evolutionary entities, making resolution of overlapping models straightforward. The implementation and support scripts for running the algorithm on computer clusters are available as open source software, and the database files underlying the web site are freely downloadable. The web site also makes our findings directly useful also for non-bioinformaticians.
format Text
id pubmed-2976758
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29767582010-11-10 Subdivision of the MDR superfamily of medium-chain dehydrogenases/reductases through iterative hidden Markov model refinement Hedlund, Joel Jörnvall, Hans Persson, Bengt BMC Bioinformatics Research Article BACKGROUND: The Medium-chain Dehydrogenases/Reductases (MDR) form a protein superfamily whose size and complexity defeats traditional means of subclassification; it currently has over 15000 members in the databases, the pairwise sequence identity is typically around 25%, there are members from all kingdoms of life, the chain-lengths vary as does the oligomericity, and the members are partaking in a multitude of biological processes. There are profile hidden Markov models (HMMs) available for detecting MDR superfamily members, but none for determining which MDR family each protein belongs to. The current torrential influx of new sequence data enables elucidation of more and more protein families, and at an increasingly fine granularity. However, gathering good quality training data usually requires manual attention by experts and has therefore been the rate limiting step for expanding the number of available models. RESULTS: We have developed an automated algorithm for HMM refinement that produces stable and reliable models for protein families. This algorithm uses relationships found in data to generate confident seed sets. Using this algorithm we have produced HMMs for 86 distinct MDR families and 34 of their subfamilies which can be used in automated annotation of new sequences. We find that MDR forms with 2 Zn(2+ )ions in general are dehydrogenases, while MDR forms with no Zn(2+ )in general are reductases. Furthermore, in Bacteria MDRs without Zn(2+ )are more frequent than those with Zn(2+), while the opposite is true for eukaryotic MDRs, indicating that Zn(2+ )has been recruited into the MDR superfamily after the initial life kingdom separations. We have also developed a web site http://mdr-enzymes.org that provides textual and numeric search against various characterised MDR family properties, as well as sequence scan functions for reliable classification of novel MDR sequences. CONCLUSIONS: Our method of refinement can be readily applied to create stable and reliable HMMs for both MDR and other protein families, and to confidently subdivide large and complex protein superfamilies. HMMs created using this algorithm correspond to evolutionary entities, making resolution of overlapping models straightforward. The implementation and support scripts for running the algorithm on computer clusters are available as open source software, and the database files underlying the web site are freely downloadable. The web site also makes our findings directly useful also for non-bioinformaticians. BioMed Central 2010-10-27 /pmc/articles/PMC2976758/ /pubmed/20979641 http://dx.doi.org/10.1186/1471-2105-11-534 Text en Copyright ©2010 Hedlund et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Hedlund, Joel
Jörnvall, Hans
Persson, Bengt
Subdivision of the MDR superfamily of medium-chain dehydrogenases/reductases through iterative hidden Markov model refinement
title Subdivision of the MDR superfamily of medium-chain dehydrogenases/reductases through iterative hidden Markov model refinement
title_full Subdivision of the MDR superfamily of medium-chain dehydrogenases/reductases through iterative hidden Markov model refinement
title_fullStr Subdivision of the MDR superfamily of medium-chain dehydrogenases/reductases through iterative hidden Markov model refinement
title_full_unstemmed Subdivision of the MDR superfamily of medium-chain dehydrogenases/reductases through iterative hidden Markov model refinement
title_short Subdivision of the MDR superfamily of medium-chain dehydrogenases/reductases through iterative hidden Markov model refinement
title_sort subdivision of the mdr superfamily of medium-chain dehydrogenases/reductases through iterative hidden markov model refinement
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2976758/
https://www.ncbi.nlm.nih.gov/pubmed/20979641
http://dx.doi.org/10.1186/1471-2105-11-534
work_keys_str_mv AT hedlundjoel subdivisionofthemdrsuperfamilyofmediumchaindehydrogenasesreductasesthroughiterativehiddenmarkovmodelrefinement
AT jornvallhans subdivisionofthemdrsuperfamilyofmediumchaindehydrogenasesreductasesthroughiterativehiddenmarkovmodelrefinement
AT perssonbengt subdivisionofthemdrsuperfamilyofmediumchaindehydrogenasesreductasesthroughiterativehiddenmarkovmodelrefinement