Cargando…

HOME: a histogram based machine learning approach for effective identification of differentially methylated regions

BACKGROUND: The development of whole genome bisulfite sequencing has made it possible to identify methylation differences at single base resolution throughout an entire genome. However, a persistent challenge in DNA methylome analysis is the accurate identification of differentially methylated regio...

Descripción completa

Detalles Bibliográficos
Autores principales: Srivastava, Akanksha, Karpievitch, Yuliya V., Eichten, Steven R., Borevitz, Justin O., Lister, Ryan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6521357/
https://www.ncbi.nlm.nih.gov/pubmed/31096906
http://dx.doi.org/10.1186/s12859-019-2845-y
_version_ 1783418939039023104
author Srivastava, Akanksha
Karpievitch, Yuliya V.
Eichten, Steven R.
Borevitz, Justin O.
Lister, Ryan
author_facet Srivastava, Akanksha
Karpievitch, Yuliya V.
Eichten, Steven R.
Borevitz, Justin O.
Lister, Ryan
author_sort Srivastava, Akanksha
collection PubMed
description BACKGROUND: The development of whole genome bisulfite sequencing has made it possible to identify methylation differences at single base resolution throughout an entire genome. However, a persistent challenge in DNA methylome analysis is the accurate identification of differentially methylated regions (DMRs) between samples. Sensitive and specific identification of DMRs among different conditions requires accurate and efficient algorithms, and while various tools have been developed to tackle this problem, they frequently suffer from inaccurate DMR boundary identification and high false positive rate. RESULTS: We present a novel Histogram Of MEthylation (HOME) based method that takes into account the inherent difference in the distribution of methylation levels between DMRs and non-DMRs to discriminate between the two using a Support Vector Machine. We show that generated features used by HOME are dataset-independent such that a classifier trained on, for example, a mouse methylome training set of regions of differentially accessible chromatin, can be applied to any other organism’s dataset and identify accurate DMRs. We demonstrate that DMRs identified by HOME exhibit higher association with biologically relevant genes, processes, and regulatory events compared to the existing methods. Moreover, HOME provides additional functionalities lacking in most of the current DMR finders such as DMR identification in non-CG context and time series analysis. HOME is freely available at https://github.com/ListerLab/HOME. CONCLUSION: HOME produces more accurate DMRs than the current state-of-the-art methods on both simulated and biological datasets. The broad applicability of HOME to identify accurate DMRs in genomic data from any organism will have a significant impact upon expanding our knowledge of how DNA methylation dynamics affect cell development and differentiation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2845-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6521357
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65213572019-05-23 HOME: a histogram based machine learning approach for effective identification of differentially methylated regions Srivastava, Akanksha Karpievitch, Yuliya V. Eichten, Steven R. Borevitz, Justin O. Lister, Ryan BMC Bioinformatics Methodology Article BACKGROUND: The development of whole genome bisulfite sequencing has made it possible to identify methylation differences at single base resolution throughout an entire genome. However, a persistent challenge in DNA methylome analysis is the accurate identification of differentially methylated regions (DMRs) between samples. Sensitive and specific identification of DMRs among different conditions requires accurate and efficient algorithms, and while various tools have been developed to tackle this problem, they frequently suffer from inaccurate DMR boundary identification and high false positive rate. RESULTS: We present a novel Histogram Of MEthylation (HOME) based method that takes into account the inherent difference in the distribution of methylation levels between DMRs and non-DMRs to discriminate between the two using a Support Vector Machine. We show that generated features used by HOME are dataset-independent such that a classifier trained on, for example, a mouse methylome training set of regions of differentially accessible chromatin, can be applied to any other organism’s dataset and identify accurate DMRs. We demonstrate that DMRs identified by HOME exhibit higher association with biologically relevant genes, processes, and regulatory events compared to the existing methods. Moreover, HOME provides additional functionalities lacking in most of the current DMR finders such as DMR identification in non-CG context and time series analysis. HOME is freely available at https://github.com/ListerLab/HOME. CONCLUSION: HOME produces more accurate DMRs than the current state-of-the-art methods on both simulated and biological datasets. The broad applicability of HOME to identify accurate DMRs in genomic data from any organism will have a significant impact upon expanding our knowledge of how DNA methylation dynamics affect cell development and differentiation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2845-y) contains supplementary material, which is available to authorized users. BioMed Central 2019-05-16 /pmc/articles/PMC6521357/ /pubmed/31096906 http://dx.doi.org/10.1186/s12859-019-2845-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Srivastava, Akanksha
Karpievitch, Yuliya V.
Eichten, Steven R.
Borevitz, Justin O.
Lister, Ryan
HOME: a histogram based machine learning approach for effective identification of differentially methylated regions
title HOME: a histogram based machine learning approach for effective identification of differentially methylated regions
title_full HOME: a histogram based machine learning approach for effective identification of differentially methylated regions
title_fullStr HOME: a histogram based machine learning approach for effective identification of differentially methylated regions
title_full_unstemmed HOME: a histogram based machine learning approach for effective identification of differentially methylated regions
title_short HOME: a histogram based machine learning approach for effective identification of differentially methylated regions
title_sort home: a histogram based machine learning approach for effective identification of differentially methylated regions
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6521357/
https://www.ncbi.nlm.nih.gov/pubmed/31096906
http://dx.doi.org/10.1186/s12859-019-2845-y
work_keys_str_mv AT srivastavaakanksha homeahistogrambasedmachinelearningapproachforeffectiveidentificationofdifferentiallymethylatedregions
AT karpievitchyuliyav homeahistogrambasedmachinelearningapproachforeffectiveidentificationofdifferentiallymethylatedregions
AT eichtenstevenr homeahistogrambasedmachinelearningapproachforeffectiveidentificationofdifferentiallymethylatedregions
AT borevitzjustino homeahistogrambasedmachinelearningapproachforeffectiveidentificationofdifferentiallymethylatedregions
AT listerryan homeahistogrambasedmachinelearningapproachforeffectiveidentificationofdifferentiallymethylatedregions