Cargando…

Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison

Despite recent advances in experimental approaches for identifying transcriptional cis-regulatory modules (CRMs, ‘enhancers’), direct empirical discovery of CRMs for all genes in all cell types and environmental conditions is likely to remain an elusive goal. Effective methods for computational CRM...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kazemian, Majid, Zhu, Qiyun, Halfon, Marc S., Sinha, Saurabh
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2011
Materias:	Computational Biology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3239187/ https://www.ncbi.nlm.nih.gov/pubmed/21821659 http://dx.doi.org/10.1093/nar/gkr621

_version_	1782219138937126912
author	Kazemian, Majid Zhu, Qiyun Halfon, Marc S. Sinha, Saurabh
author_facet	Kazemian, Majid Zhu, Qiyun Halfon, Marc S. Sinha, Saurabh
author_sort	Kazemian, Majid
collection	PubMed
description	Despite recent advances in experimental approaches for identifying transcriptional cis-regulatory modules (CRMs, ‘enhancers’), direct empirical discovery of CRMs for all genes in all cell types and environmental conditions is likely to remain an elusive goal. Effective methods for computational CRM discovery are thus a critically needed complement to empirical approaches. However, existing computational methods that search for clusters of putative binding sites are ineffective if the relevant TFs and/or their binding specificities are unknown. Here, we provide a significantly improved method for ‘motif-blind’ CRM discovery that does not depend on knowledge or accurate prediction of TF-binding motifs and is effective when limited knowledge of functional CRMs is available to ‘supervise’ the search. We propose a new statistical method, based on ‘Interpolated Markov Models’, for motif-blind, genome-wide CRM discovery. It captures the statistical profile of variable length words in known CRMs of a regulatory network and finds candidate CRMs that match this profile. The method also uses orthologs of the known CRMs from closely related genomes. We perform in silico evaluation of predicted CRMs by assessing whether their neighboring genes are enriched for the expected expression patterns. This assessment uses a novel statistical test that extends the widely used Hypergeometric test of gene set enrichment to account for variability in intergenic lengths. We find that the new CRM prediction method is superior to existing methods. Finally, we experimentally validate 12 new CRM predictions by examining their regulatory activity in vivo in Drosophila; 10 of the tested CRMs were found to be functional, while 6 of the top 7 predictions showed the expected activity patterns. We make our program available as downloadable source code, and as a plugin for a genome browser installed on our servers.
format	Online Article Text
id	pubmed-3239187
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-32391872011-12-16 Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison Kazemian, Majid Zhu, Qiyun Halfon, Marc S. Sinha, Saurabh Nucleic Acids Res Computational Biology Despite recent advances in experimental approaches for identifying transcriptional cis-regulatory modules (CRMs, ‘enhancers’), direct empirical discovery of CRMs for all genes in all cell types and environmental conditions is likely to remain an elusive goal. Effective methods for computational CRM discovery are thus a critically needed complement to empirical approaches. However, existing computational methods that search for clusters of putative binding sites are ineffective if the relevant TFs and/or their binding specificities are unknown. Here, we provide a significantly improved method for ‘motif-blind’ CRM discovery that does not depend on knowledge or accurate prediction of TF-binding motifs and is effective when limited knowledge of functional CRMs is available to ‘supervise’ the search. We propose a new statistical method, based on ‘Interpolated Markov Models’, for motif-blind, genome-wide CRM discovery. It captures the statistical profile of variable length words in known CRMs of a regulatory network and finds candidate CRMs that match this profile. The method also uses orthologs of the known CRMs from closely related genomes. We perform in silico evaluation of predicted CRMs by assessing whether their neighboring genes are enriched for the expected expression patterns. This assessment uses a novel statistical test that extends the widely used Hypergeometric test of gene set enrichment to account for variability in intergenic lengths. We find that the new CRM prediction method is superior to existing methods. Finally, we experimentally validate 12 new CRM predictions by examining their regulatory activity in vivo in Drosophila; 10 of the tested CRMs were found to be functional, while 6 of the top 7 predictions showed the expected activity patterns. We make our program available as downloadable source code, and as a plugin for a genome browser installed on our servers. Oxford University Press 2011-12 2011-08-05 /pmc/articles/PMC3239187/ /pubmed/21821659 http://dx.doi.org/10.1093/nar/gkr621 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Computational Biology Kazemian, Majid Zhu, Qiyun Halfon, Marc S. Sinha, Saurabh Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison
title	Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison
title_full	Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison
title_fullStr	Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison
title_full_unstemmed	Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison
title_short	Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison
title_sort	improved accuracy of supervised crm discovery with interpolated markov models and cross-species comparison
topic	Computational Biology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3239187/ https://www.ncbi.nlm.nih.gov/pubmed/21821659 http://dx.doi.org/10.1093/nar/gkr621
work_keys_str_mv	AT kazemianmajid improvedaccuracyofsupervisedcrmdiscoverywithinterpolatedmarkovmodelsandcrossspeciescomparison AT zhuqiyun improvedaccuracyofsupervisedcrmdiscoverywithinterpolatedmarkovmodelsandcrossspeciescomparison AT halfonmarcs improvedaccuracyofsupervisedcrmdiscoverywithinterpolatedmarkovmodelsandcrossspeciescomparison AT sinhasaurabh improvedaccuracyofsupervisedcrmdiscoverywithinterpolatedmarkovmodelsandcrossspeciescomparison

Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison

Ejemplares similares