Cargando…

Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns

Non-methylated islands (NMIs) of DNA are genomic regions that are important for gene regulation and development. A recent study of genome-wide non-methylation data in vertebrates by Long et al. (eLife 2013;2:e00348) has shown that many experimentally identified non-methylated regions do not overlap...

Descripción completa

Detalles Bibliográficos
Autores principales: Huska, Matthew, Vingron, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5161304/
https://www.ncbi.nlm.nih.gov/pubmed/27984582
http://dx.doi.org/10.1371/journal.pcbi.1005249
_version_ 1782482057185722368
author Huska, Matthew
Vingron, Martin
author_facet Huska, Matthew
Vingron, Martin
author_sort Huska, Matthew
collection PubMed
description Non-methylated islands (NMIs) of DNA are genomic regions that are important for gene regulation and development. A recent study of genome-wide non-methylation data in vertebrates by Long et al. (eLife 2013;2:e00348) has shown that many experimentally identified non-methylated regions do not overlap with classically defined CpG islands which are computationally predicted using simple DNA sequence features. This is especially true in cold-blooded vertebrates such as Danio rerio (zebrafish). In order to investigate how predictive DNA sequence is of a region’s methylation status, we applied a supervised learning approach using a spectrum kernel support vector machine, to see if a more complex model and supervised learning can be used to improve non-methylated island prediction and to understand the sequence properties of these regions. We demonstrate that DNA sequence is highly predictive of methylation status, and that in contrast to existing CpG island prediction methods our method is able to provide more useful predictions of NMIs genome-wide in all vertebrate organisms that were studied. Our results also show that in cold-blooded vertebrates (Anolis carolinensis, Xenopus tropicalis and Danio rerio) where genome-wide classical CpG island predictions consist primarily of false positives, longer primarily AT-rich DNA sequence features are able to identify these regions much more accurately.
format Online
Article
Text
id pubmed-5161304
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-51613042017-01-04 Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns Huska, Matthew Vingron, Martin PLoS Comput Biol Research Article Non-methylated islands (NMIs) of DNA are genomic regions that are important for gene regulation and development. A recent study of genome-wide non-methylation data in vertebrates by Long et al. (eLife 2013;2:e00348) has shown that many experimentally identified non-methylated regions do not overlap with classically defined CpG islands which are computationally predicted using simple DNA sequence features. This is especially true in cold-blooded vertebrates such as Danio rerio (zebrafish). In order to investigate how predictive DNA sequence is of a region’s methylation status, we applied a supervised learning approach using a spectrum kernel support vector machine, to see if a more complex model and supervised learning can be used to improve non-methylated island prediction and to understand the sequence properties of these regions. We demonstrate that DNA sequence is highly predictive of methylation status, and that in contrast to existing CpG island prediction methods our method is able to provide more useful predictions of NMIs genome-wide in all vertebrate organisms that were studied. Our results also show that in cold-blooded vertebrates (Anolis carolinensis, Xenopus tropicalis and Danio rerio) where genome-wide classical CpG island predictions consist primarily of false positives, longer primarily AT-rich DNA sequence features are able to identify these regions much more accurately. Public Library of Science 2016-12-16 /pmc/articles/PMC5161304/ /pubmed/27984582 http://dx.doi.org/10.1371/journal.pcbi.1005249 Text en © 2016 Huska, Vingron http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Huska, Matthew
Vingron, Martin
Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns
title Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns
title_full Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns
title_fullStr Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns
title_full_unstemmed Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns
title_short Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns
title_sort improved prediction of non-methylated islands in vertebrates highlights different characteristic sequence patterns
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5161304/
https://www.ncbi.nlm.nih.gov/pubmed/27984582
http://dx.doi.org/10.1371/journal.pcbi.1005249
work_keys_str_mv AT huskamatthew improvedpredictionofnonmethylatedislandsinvertebrateshighlightsdifferentcharacteristicsequencepatterns
AT vingronmartin improvedpredictionofnonmethylatedislandsinvertebrateshighlightsdifferentcharacteristicsequencepatterns