Cargando…

Generalizations of Markov model to characterize biological sequences

BACKGROUND: The currently used k(th )order Markov models estimate the probability of generating a single nucleotide conditional upon the immediately preceding (gap = 0) k units. However, this neither takes into account the joint dependency of multiple neighboring nucleotides, nor does it consider th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Junwen, Hannenhalli, Sridhar
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2005
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1236913/ https://www.ncbi.nlm.nih.gov/pubmed/16144548 http://dx.doi.org/10.1186/1471-2105-6-219

_version_	1782125007379365888
author	Wang, Junwen Hannenhalli, Sridhar
author_facet	Wang, Junwen Hannenhalli, Sridhar
author_sort	Wang, Junwen
collection	PubMed
description	BACKGROUND: The currently used k(th )order Markov models estimate the probability of generating a single nucleotide conditional upon the immediately preceding (gap = 0) k units. However, this neither takes into account the joint dependency of multiple neighboring nucleotides, nor does it consider the long range dependency with gap>0. RESULT: We describe a configurable tool to explore generalizations of the standard Markov model. We evaluated whether the sequence classification accuracy can be improved by using an alternative set of model parameters. The evaluation was done on four classes of biological sequences – CpG-poor promoters, all promoters, exons and nucleosome positioning sequences. Using di- and tri-nucleotide as the model unit significantly improved the sequence classification accuracy relative to the standard single nucleotide model. In the case of nucleosome positioning sequences, optimal accuracy was achieved at a gap length of 4. Furthermore in the plot of classification accuracy versus the gap, a periodicity of 10–11 bps was observed which might indicate structural preferences in the nucleosome positioning sequence. The tool is implemented in Java and is available for download at . CONCLUSION: Markov modeling is an important component of many sequence analysis tools. We have extended the standard Markov model to incorporate joint and long range dependencies between the sequence elements. The proposed generalizations of the Markov model are likely to improve the overall accuracy of sequence analysis tools.
format	Text
id	pubmed-1236913
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-12369132005-09-29 Generalizations of Markov model to characterize biological sequences Wang, Junwen Hannenhalli, Sridhar BMC Bioinformatics Software BACKGROUND: The currently used k(th )order Markov models estimate the probability of generating a single nucleotide conditional upon the immediately preceding (gap = 0) k units. However, this neither takes into account the joint dependency of multiple neighboring nucleotides, nor does it consider the long range dependency with gap>0. RESULT: We describe a configurable tool to explore generalizations of the standard Markov model. We evaluated whether the sequence classification accuracy can be improved by using an alternative set of model parameters. The evaluation was done on four classes of biological sequences – CpG-poor promoters, all promoters, exons and nucleosome positioning sequences. Using di- and tri-nucleotide as the model unit significantly improved the sequence classification accuracy relative to the standard single nucleotide model. In the case of nucleosome positioning sequences, optimal accuracy was achieved at a gap length of 4. Furthermore in the plot of classification accuracy versus the gap, a periodicity of 10–11 bps was observed which might indicate structural preferences in the nucleosome positioning sequence. The tool is implemented in Java and is available for download at . CONCLUSION: Markov modeling is an important component of many sequence analysis tools. We have extended the standard Markov model to incorporate joint and long range dependencies between the sequence elements. The proposed generalizations of the Markov model are likely to improve the overall accuracy of sequence analysis tools. BioMed Central 2005-09-06 /pmc/articles/PMC1236913/ /pubmed/16144548 http://dx.doi.org/10.1186/1471-2105-6-219 Text en Copyright © 2005 Wang and Hannenhalli; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Wang, Junwen Hannenhalli, Sridhar Generalizations of Markov model to characterize biological sequences
title	Generalizations of Markov model to characterize biological sequences
title_full	Generalizations of Markov model to characterize biological sequences
title_fullStr	Generalizations of Markov model to characterize biological sequences
title_full_unstemmed	Generalizations of Markov model to characterize biological sequences
title_short	Generalizations of Markov model to characterize biological sequences
title_sort	generalizations of markov model to characterize biological sequences
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1236913/ https://www.ncbi.nlm.nih.gov/pubmed/16144548 http://dx.doi.org/10.1186/1471-2105-6-219
work_keys_str_mv	AT wangjunwen generalizationsofmarkovmodeltocharacterizebiologicalsequences AT hannenhallisridhar generalizationsofmarkovmodeltocharacterizebiologicalsequences

Generalizations of Markov model to characterize biological sequences

Ejemplares similares