Cargando…

One size does not fit all: On how Markov model order dictates performance of genomic sequence analyses

The structural simplicity and ability to capture serial correlations make Markov models a popular modeling choice in several genomic analyses, such as identification of motifs, genes and regulatory elements. A critical, yet relatively unexplored, issue is the determination of the order of the Markov...

Descripción completa

Detalles Bibliográficos
Autores principales: Narlikar, Leelavati, Mehta, Nidhi, Galande, Sanjeev, Arjunwadkar, Mihir
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3562003/
https://www.ncbi.nlm.nih.gov/pubmed/23267010
http://dx.doi.org/10.1093/nar/gks1285
_version_ 1782258032120430592
author Narlikar, Leelavati
Mehta, Nidhi
Galande, Sanjeev
Arjunwadkar, Mihir
author_facet Narlikar, Leelavati
Mehta, Nidhi
Galande, Sanjeev
Arjunwadkar, Mihir
author_sort Narlikar, Leelavati
collection PubMed
description The structural simplicity and ability to capture serial correlations make Markov models a popular modeling choice in several genomic analyses, such as identification of motifs, genes and regulatory elements. A critical, yet relatively unexplored, issue is the determination of the order of the Markov model. Most biological applications use a predetermined order for all data sets indiscriminately. Here, we show the vast variation in the performance of such applications with the order. To identify the ‘optimal’ order, we investigated two model selection criteria: Akaike information criterion and Bayesian information criterion (BIC). The BIC optimal order delivers the best performance for mammalian phylogeny reconstruction and motif discovery. Importantly, this order is different from orders typically used by many tools, suggesting that a simple additional step determining this order can significantly improve results. Further, we describe a novel classification approach based on BIC optimal Markov models to predict functionality of tissue-specific promoters. Our classifier discriminates between promoters active across 12 different tissues with remarkable accuracy, yielding 3 times the precision expected by chance. Application to the metagenomics problem of identifying the taxum from a short DNA fragment yields accuracies at least as high as the more complex mainstream methodologies, while retaining conceptual and computational simplicity.
format Online
Article
Text
id pubmed-3562003
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-35620032013-02-01 One size does not fit all: On how Markov model order dictates performance of genomic sequence analyses Narlikar, Leelavati Mehta, Nidhi Galande, Sanjeev Arjunwadkar, Mihir Nucleic Acids Res Computational Biology The structural simplicity and ability to capture serial correlations make Markov models a popular modeling choice in several genomic analyses, such as identification of motifs, genes and regulatory elements. A critical, yet relatively unexplored, issue is the determination of the order of the Markov model. Most biological applications use a predetermined order for all data sets indiscriminately. Here, we show the vast variation in the performance of such applications with the order. To identify the ‘optimal’ order, we investigated two model selection criteria: Akaike information criterion and Bayesian information criterion (BIC). The BIC optimal order delivers the best performance for mammalian phylogeny reconstruction and motif discovery. Importantly, this order is different from orders typically used by many tools, suggesting that a simple additional step determining this order can significantly improve results. Further, we describe a novel classification approach based on BIC optimal Markov models to predict functionality of tissue-specific promoters. Our classifier discriminates between promoters active across 12 different tissues with remarkable accuracy, yielding 3 times the precision expected by chance. Application to the metagenomics problem of identifying the taxum from a short DNA fragment yields accuracies at least as high as the more complex mainstream methodologies, while retaining conceptual and computational simplicity. Oxford University Press 2013-02 2012-12-24 /pmc/articles/PMC3562003/ /pubmed/23267010 http://dx.doi.org/10.1093/nar/gks1285 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com.
spellingShingle Computational Biology
Narlikar, Leelavati
Mehta, Nidhi
Galande, Sanjeev
Arjunwadkar, Mihir
One size does not fit all: On how Markov model order dictates performance of genomic sequence analyses
title One size does not fit all: On how Markov model order dictates performance of genomic sequence analyses
title_full One size does not fit all: On how Markov model order dictates performance of genomic sequence analyses
title_fullStr One size does not fit all: On how Markov model order dictates performance of genomic sequence analyses
title_full_unstemmed One size does not fit all: On how Markov model order dictates performance of genomic sequence analyses
title_short One size does not fit all: On how Markov model order dictates performance of genomic sequence analyses
title_sort one size does not fit all: on how markov model order dictates performance of genomic sequence analyses
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3562003/
https://www.ncbi.nlm.nih.gov/pubmed/23267010
http://dx.doi.org/10.1093/nar/gks1285
work_keys_str_mv AT narlikarleelavati onesizedoesnotfitallonhowmarkovmodelorderdictatesperformanceofgenomicsequenceanalyses
AT mehtanidhi onesizedoesnotfitallonhowmarkovmodelorderdictatesperformanceofgenomicsequenceanalyses
AT galandesanjeev onesizedoesnotfitallonhowmarkovmodelorderdictatesperformanceofgenomicsequenceanalyses
AT arjunwadkarmihir onesizedoesnotfitallonhowmarkovmodelorderdictatesperformanceofgenomicsequenceanalyses