Cargando…

A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states

BACKGROUND: Epigenetic modifications are essential for controlling gene expression. Recent studies have shown that not only single epigenetic modifications but also combinations of multiple epigenetic modifications play vital roles in gene regulation. A striking example is the long hypomethylated re...

Descripción completa

Detalles Bibliográficos
Autores principales: Ichikawa, Kazuki, Morishita, Shinichi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331722/
https://www.ncbi.nlm.nih.gov/pubmed/25708947
http://dx.doi.org/10.1186/1471-2164-16-S2-S8
_version_ 1782357766876168192
author Ichikawa, Kazuki
Morishita, Shinichi
author_facet Ichikawa, Kazuki
Morishita, Shinichi
author_sort Ichikawa, Kazuki
collection PubMed
description BACKGROUND: Epigenetic modifications are essential for controlling gene expression. Recent studies have shown that not only single epigenetic modifications but also combinations of multiple epigenetic modifications play vital roles in gene regulation. A striking example is the long hypomethylated regions enriched with modified H3K27me3 (called, "K27HMD" regions), which are exposed to suppress the expression of key developmental genes relevant to cellular development and differentiation during embryonic stages in vertebrates. It is thus a biologically important issue to develop an effective optimization algorithm for detecting long DNA regions (e.g., >4 kbp in size) that harbor a specific combination of epigenetic modifications (e.g., K27HMD regions). However, to date, optimization algorithms for these purposes have received little attention, and available methods are still heuristic and ad hoc. RESULTS: In this paper, we propose a linear time algorithm for calculating a set of non-overlapping regions that maximizes the sum of similarities between the vector of focal epigenetic states and the vectors of raw epigenetic states at DNA positions in the set of regions. The average elapsed time to process the epigenetic data of any of human chromosomes was less than 2 seconds on an Intel Xeon CPU. To demonstrate the effectiveness of the algorithm, we estimated large K27HMD regions in the medaka and human genomes using our method, ChromHMM, and a heuristic method. CONCLUSIONS: We confirmed that the advantages of our method over those of the two other methods. Our method is flexible enough to handle other types of epigenetic combinations. The program that implements the method is called "CSMinfinder" and is made available at: http://mlab.cb.k.u-tokyo.ac.jp/~ichikawa/Segmentation/
format Online
Article
Text
id pubmed-4331722
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43317222015-03-19 A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states Ichikawa, Kazuki Morishita, Shinichi BMC Genomics Proceedings BACKGROUND: Epigenetic modifications are essential for controlling gene expression. Recent studies have shown that not only single epigenetic modifications but also combinations of multiple epigenetic modifications play vital roles in gene regulation. A striking example is the long hypomethylated regions enriched with modified H3K27me3 (called, "K27HMD" regions), which are exposed to suppress the expression of key developmental genes relevant to cellular development and differentiation during embryonic stages in vertebrates. It is thus a biologically important issue to develop an effective optimization algorithm for detecting long DNA regions (e.g., >4 kbp in size) that harbor a specific combination of epigenetic modifications (e.g., K27HMD regions). However, to date, optimization algorithms for these purposes have received little attention, and available methods are still heuristic and ad hoc. RESULTS: In this paper, we propose a linear time algorithm for calculating a set of non-overlapping regions that maximizes the sum of similarities between the vector of focal epigenetic states and the vectors of raw epigenetic states at DNA positions in the set of regions. The average elapsed time to process the epigenetic data of any of human chromosomes was less than 2 seconds on an Intel Xeon CPU. To demonstrate the effectiveness of the algorithm, we estimated large K27HMD regions in the medaka and human genomes using our method, ChromHMM, and a heuristic method. CONCLUSIONS: We confirmed that the advantages of our method over those of the two other methods. Our method is flexible enough to handle other types of epigenetic combinations. The program that implements the method is called "CSMinfinder" and is made available at: http://mlab.cb.k.u-tokyo.ac.jp/~ichikawa/Segmentation/ BioMed Central 2015-01-21 /pmc/articles/PMC4331722/ /pubmed/25708947 http://dx.doi.org/10.1186/1471-2164-16-S2-S8 Text en Copyright © 2015 Ichikawa and Morishita; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Ichikawa, Kazuki
Morishita, Shinichi
A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states
title A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states
title_full A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states
title_fullStr A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states
title_full_unstemmed A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states
title_short A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states
title_sort linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331722/
https://www.ncbi.nlm.nih.gov/pubmed/25708947
http://dx.doi.org/10.1186/1471-2164-16-S2-S8
work_keys_str_mv AT ichikawakazuki alineartimealgorithmfordetectinglonggenomicregionsenrichedwithaspecificcombinationofepigeneticstates
AT morishitashinichi alineartimealgorithmfordetectinglonggenomicregionsenrichedwithaspecificcombinationofepigeneticstates
AT ichikawakazuki lineartimealgorithmfordetectinglonggenomicregionsenrichedwithaspecificcombinationofepigeneticstates
AT morishitashinichi lineartimealgorithmfordetectinglonggenomicregionsenrichedwithaspecificcombinationofepigeneticstates