Cargando…

seqlm: an MDL based method for identifying differentially methylated regions in high density methylation array data

Motivation: One of the main goals of large scale methylation studies is to detect differentially methylated loci. One way is to approach this problem sitewise, i.e. to find differentially methylated positions (DMPs). However, it has been shown that methylation is regulated in longer genomic regions....

Descripción completa

Detalles Bibliográficos
Autores principales: Kolde, Raivo, Märtens, Kaspar, Lokk, Kaie, Laur, Sven, Vilo, Jaak
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5013909/
https://www.ncbi.nlm.nih.gov/pubmed/27187204
http://dx.doi.org/10.1093/bioinformatics/btw304
_version_ 1782452237077839872
author Kolde, Raivo
Märtens, Kaspar
Lokk, Kaie
Laur, Sven
Vilo, Jaak
author_facet Kolde, Raivo
Märtens, Kaspar
Lokk, Kaie
Laur, Sven
Vilo, Jaak
author_sort Kolde, Raivo
collection PubMed
description Motivation: One of the main goals of large scale methylation studies is to detect differentially methylated loci. One way is to approach this problem sitewise, i.e. to find differentially methylated positions (DMPs). However, it has been shown that methylation is regulated in longer genomic regions. So it is more desirable to identify differentially methylated regions (DMRs) instead of DMPs. The new high coverage arrays, like Illuminas 450k platform, make it possible at a reasonable cost. Few tools exist for DMR identification from this type of data, but there is no standard approach. Results: We propose a novel method for DMR identification that detects the region boundaries according to the minimum description length (MDL) principle, essentially solving the problem of model selection. The significance of the regions is established using linear mixed models. Using both simulated and large publicly available methylation datasets, we compare seqlm performance to alternative approaches. We demonstrate that it is both more sensitive and specific than competing methods. This is achieved with minimal parameter tuning and, surprisingly, quickest running time of all the tried methods. Finally, we show that the regional differential methylation patterns identified on sparse array data are confirmed by higher resolution sequencing approaches. Availability and Implementation: The methods have been implemented in R package seqlm that is available through Github: https://github.com/raivokolde/seqlm Contact: rkolde@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5013909
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-50139092016-09-12 seqlm: an MDL based method for identifying differentially methylated regions in high density methylation array data Kolde, Raivo Märtens, Kaspar Lokk, Kaie Laur, Sven Vilo, Jaak Bioinformatics Original Papers Motivation: One of the main goals of large scale methylation studies is to detect differentially methylated loci. One way is to approach this problem sitewise, i.e. to find differentially methylated positions (DMPs). However, it has been shown that methylation is regulated in longer genomic regions. So it is more desirable to identify differentially methylated regions (DMRs) instead of DMPs. The new high coverage arrays, like Illuminas 450k platform, make it possible at a reasonable cost. Few tools exist for DMR identification from this type of data, but there is no standard approach. Results: We propose a novel method for DMR identification that detects the region boundaries according to the minimum description length (MDL) principle, essentially solving the problem of model selection. The significance of the regions is established using linear mixed models. Using both simulated and large publicly available methylation datasets, we compare seqlm performance to alternative approaches. We demonstrate that it is both more sensitive and specific than competing methods. This is achieved with minimal parameter tuning and, surprisingly, quickest running time of all the tried methods. Finally, we show that the regional differential methylation patterns identified on sparse array data are confirmed by higher resolution sequencing approaches. Availability and Implementation: The methods have been implemented in R package seqlm that is available through Github: https://github.com/raivokolde/seqlm Contact: rkolde@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-09-01 2016-05-13 /pmc/articles/PMC5013909/ /pubmed/27187204 http://dx.doi.org/10.1093/bioinformatics/btw304 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Kolde, Raivo
Märtens, Kaspar
Lokk, Kaie
Laur, Sven
Vilo, Jaak
seqlm: an MDL based method for identifying differentially methylated regions in high density methylation array data
title seqlm: an MDL based method for identifying differentially methylated regions in high density methylation array data
title_full seqlm: an MDL based method for identifying differentially methylated regions in high density methylation array data
title_fullStr seqlm: an MDL based method for identifying differentially methylated regions in high density methylation array data
title_full_unstemmed seqlm: an MDL based method for identifying differentially methylated regions in high density methylation array data
title_short seqlm: an MDL based method for identifying differentially methylated regions in high density methylation array data
title_sort seqlm: an mdl based method for identifying differentially methylated regions in high density methylation array data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5013909/
https://www.ncbi.nlm.nih.gov/pubmed/27187204
http://dx.doi.org/10.1093/bioinformatics/btw304
work_keys_str_mv AT kolderaivo seqlmanmdlbasedmethodforidentifyingdifferentiallymethylatedregionsinhighdensitymethylationarraydata
AT martenskaspar seqlmanmdlbasedmethodforidentifyingdifferentiallymethylatedregionsinhighdensitymethylationarraydata
AT lokkkaie seqlmanmdlbasedmethodforidentifyingdifferentiallymethylatedregionsinhighdensitymethylationarraydata
AT laursven seqlmanmdlbasedmethodforidentifyingdifferentiallymethylatedregionsinhighdensitymethylationarraydata
AT vilojaak seqlmanmdlbasedmethodforidentifyingdifferentiallymethylatedregionsinhighdensitymethylationarraydata