Cargando…

Notos - a galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types

BACKGROUND: DNA methylation patterns store epigenetic information in the vast majority of eukaryotic species. The relatively high costs and technical challenges associated with the detection of DNA methylation however have created a bias in the number of methylation studies towards model organisms....

Descripción completa

Detalles Bibliográficos
Autores principales: Bulla, Ingo, Aliaga, Benoît, Lacal, Virginia, Bulla, Jan, Grunau, Christoph, Chaparro, Cristian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870242/
https://www.ncbi.nlm.nih.gov/pubmed/29587630
http://dx.doi.org/10.1186/s12859-018-2115-4
_version_ 1783309436740173824
author Bulla, Ingo
Aliaga, Benoît
Lacal, Virginia
Bulla, Jan
Grunau, Christoph
Chaparro, Cristian
author_facet Bulla, Ingo
Aliaga, Benoît
Lacal, Virginia
Bulla, Jan
Grunau, Christoph
Chaparro, Cristian
author_sort Bulla, Ingo
collection PubMed
description BACKGROUND: DNA methylation patterns store epigenetic information in the vast majority of eukaryotic species. The relatively high costs and technical challenges associated with the detection of DNA methylation however have created a bias in the number of methylation studies towards model organisms. Consequently, it remains challenging to infer kingdom-wide general rules about the functions and evolutionary conservation of DNA methylation. Methylated cytosine is often found in specific CpN dinucleotides, and the frequency distributions of, for instance, CpG observed/expected (CpG o/e) ratios have been used to infer DNA methylation types based on higher mutability of methylated CpG. RESULTS: Predominantly model-based approaches essentially founded on mixtures of Gaussian distributions are currently used to investigate questions related to the number and position of modes of CpG o/e ratios. These approaches require the selection of an appropriate criterion for determining the best model and will fail if empirical distributions are complex or even merely moderately skewed. We use a kernel density estimation (KDE) based technique for robust and precise characterization of complex CpN o/e distributions without a priori assumptions about the underlying distributions. CONCLUSIONS: We show that KDE delivers robust descriptions of CpN o/e distributions. For straightforward processing, we have developed a Galaxy tool, called Notos and available at the ToolShed, that calculates these ratios of input FASTA files and fits a density to their empirical distribution. Based on the estimated density the number and shape of modes of the distribution is determined, providing a rational for the prediction of the number and the types of different methylation classes. Notos is written in R and Perl. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2115-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5870242
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-58702422018-03-29 Notos - a galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types Bulla, Ingo Aliaga, Benoît Lacal, Virginia Bulla, Jan Grunau, Christoph Chaparro, Cristian BMC Bioinformatics Research Article BACKGROUND: DNA methylation patterns store epigenetic information in the vast majority of eukaryotic species. The relatively high costs and technical challenges associated with the detection of DNA methylation however have created a bias in the number of methylation studies towards model organisms. Consequently, it remains challenging to infer kingdom-wide general rules about the functions and evolutionary conservation of DNA methylation. Methylated cytosine is often found in specific CpN dinucleotides, and the frequency distributions of, for instance, CpG observed/expected (CpG o/e) ratios have been used to infer DNA methylation types based on higher mutability of methylated CpG. RESULTS: Predominantly model-based approaches essentially founded on mixtures of Gaussian distributions are currently used to investigate questions related to the number and position of modes of CpG o/e ratios. These approaches require the selection of an appropriate criterion for determining the best model and will fail if empirical distributions are complex or even merely moderately skewed. We use a kernel density estimation (KDE) based technique for robust and precise characterization of complex CpN o/e distributions without a priori assumptions about the underlying distributions. CONCLUSIONS: We show that KDE delivers robust descriptions of CpN o/e distributions. For straightforward processing, we have developed a Galaxy tool, called Notos and available at the ToolShed, that calculates these ratios of input FASTA files and fits a density to their empirical distribution. Based on the estimated density the number and shape of modes of the distribution is determined, providing a rational for the prediction of the number and the types of different methylation classes. Notos is written in R and Perl. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2115-4) contains supplementary material, which is available to authorized users. BioMed Central 2018-03-27 /pmc/articles/PMC5870242/ /pubmed/29587630 http://dx.doi.org/10.1186/s12859-018-2115-4 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Bulla, Ingo
Aliaga, Benoît
Lacal, Virginia
Bulla, Jan
Grunau, Christoph
Chaparro, Cristian
Notos - a galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types
title Notos - a galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types
title_full Notos - a galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types
title_fullStr Notos - a galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types
title_full_unstemmed Notos - a galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types
title_short Notos - a galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types
title_sort notos - a galaxy tool to analyze cpn observed expected ratios for inferring dna methylation types
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870242/
https://www.ncbi.nlm.nih.gov/pubmed/29587630
http://dx.doi.org/10.1186/s12859-018-2115-4
work_keys_str_mv AT bullaingo notosagalaxytooltoanalyzecpnobservedexpectedratiosforinferringdnamethylationtypes
AT aliagabenoit notosagalaxytooltoanalyzecpnobservedexpectedratiosforinferringdnamethylationtypes
AT lacalvirginia notosagalaxytooltoanalyzecpnobservedexpectedratiosforinferringdnamethylationtypes
AT bullajan notosagalaxytooltoanalyzecpnobservedexpectedratiosforinferringdnamethylationtypes
AT grunauchristoph notosagalaxytooltoanalyzecpnobservedexpectedratiosforinferringdnamethylationtypes
AT chaparrocristian notosagalaxytooltoanalyzecpnobservedexpectedratiosforinferringdnamethylationtypes