Cargando…

Epigenetic Marks and Variation of Sequence-Based Information Along Genomic Regions Are Predictive of Recombination Hot/Cold Spots in Saccharomyces cerevisiae

Characterization and identification of recombination hotspots provide important insights into the mechanism of recombination and genome evolution. In contrast with existing sequence-based models for predicting recombination hotspots which were defined in a ORF-based manner, here, we first defined re...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Guoqing, Song, Shuangjian, Zhang, Qiguo, Dong, Biyu, Sun, Yu, Liu, Guojun, Zhao, Xiujuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8276760/
https://www.ncbi.nlm.nih.gov/pubmed/34267784
http://dx.doi.org/10.3389/fgene.2021.705038
_version_ 1783721962004021248
author Liu, Guoqing
Song, Shuangjian
Zhang, Qiguo
Dong, Biyu
Sun, Yu
Liu, Guojun
Zhao, Xiujuan
author_facet Liu, Guoqing
Song, Shuangjian
Zhang, Qiguo
Dong, Biyu
Sun, Yu
Liu, Guojun
Zhao, Xiujuan
author_sort Liu, Guoqing
collection PubMed
description Characterization and identification of recombination hotspots provide important insights into the mechanism of recombination and genome evolution. In contrast with existing sequence-based models for predicting recombination hotspots which were defined in a ORF-based manner, here, we first defined recombination hot/cold spots based on public high-resolution Spo11-oligo-seq data, then characterized them in terms of DNA sequence and epigenetic marks, and finally presented classifiers to identify hotspots. We found that, in addition to some previously discovered DNA-based features like GC-skew, recombination hotspots in yeast can also be characterized by some remarkable features associated with DNA physical properties and shape. More importantly, by using DNA-based features and several epigenetic marks, we built several classifiers to discriminate hotspots from coldspots, and found that SVM classifier performs the best with an accuracy of ∼92%, which is also the highest among the models in comparison. Feature importance analysis combined with prediction results show that epigenetic marks and variation of sequence-based features along the hotspots contribute dominantly to hotspot identification. By using incremental feature selection method, an optimal feature subset that consists of much less features was obtained without sacrificing prediction accuracy.
format Online
Article
Text
id pubmed-8276760
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-82767602021-07-14 Epigenetic Marks and Variation of Sequence-Based Information Along Genomic Regions Are Predictive of Recombination Hot/Cold Spots in Saccharomyces cerevisiae Liu, Guoqing Song, Shuangjian Zhang, Qiguo Dong, Biyu Sun, Yu Liu, Guojun Zhao, Xiujuan Front Genet Genetics Characterization and identification of recombination hotspots provide important insights into the mechanism of recombination and genome evolution. In contrast with existing sequence-based models for predicting recombination hotspots which were defined in a ORF-based manner, here, we first defined recombination hot/cold spots based on public high-resolution Spo11-oligo-seq data, then characterized them in terms of DNA sequence and epigenetic marks, and finally presented classifiers to identify hotspots. We found that, in addition to some previously discovered DNA-based features like GC-skew, recombination hotspots in yeast can also be characterized by some remarkable features associated with DNA physical properties and shape. More importantly, by using DNA-based features and several epigenetic marks, we built several classifiers to discriminate hotspots from coldspots, and found that SVM classifier performs the best with an accuracy of ∼92%, which is also the highest among the models in comparison. Feature importance analysis combined with prediction results show that epigenetic marks and variation of sequence-based features along the hotspots contribute dominantly to hotspot identification. By using incremental feature selection method, an optimal feature subset that consists of much less features was obtained without sacrificing prediction accuracy. Frontiers Media S.A. 2021-06-29 /pmc/articles/PMC8276760/ /pubmed/34267784 http://dx.doi.org/10.3389/fgene.2021.705038 Text en Copyright © 2021 Liu, Song, Zhang, Dong, Sun, Liu and Zhao. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Liu, Guoqing
Song, Shuangjian
Zhang, Qiguo
Dong, Biyu
Sun, Yu
Liu, Guojun
Zhao, Xiujuan
Epigenetic Marks and Variation of Sequence-Based Information Along Genomic Regions Are Predictive of Recombination Hot/Cold Spots in Saccharomyces cerevisiae
title Epigenetic Marks and Variation of Sequence-Based Information Along Genomic Regions Are Predictive of Recombination Hot/Cold Spots in Saccharomyces cerevisiae
title_full Epigenetic Marks and Variation of Sequence-Based Information Along Genomic Regions Are Predictive of Recombination Hot/Cold Spots in Saccharomyces cerevisiae
title_fullStr Epigenetic Marks and Variation of Sequence-Based Information Along Genomic Regions Are Predictive of Recombination Hot/Cold Spots in Saccharomyces cerevisiae
title_full_unstemmed Epigenetic Marks and Variation of Sequence-Based Information Along Genomic Regions Are Predictive of Recombination Hot/Cold Spots in Saccharomyces cerevisiae
title_short Epigenetic Marks and Variation of Sequence-Based Information Along Genomic Regions Are Predictive of Recombination Hot/Cold Spots in Saccharomyces cerevisiae
title_sort epigenetic marks and variation of sequence-based information along genomic regions are predictive of recombination hot/cold spots in saccharomyces cerevisiae
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8276760/
https://www.ncbi.nlm.nih.gov/pubmed/34267784
http://dx.doi.org/10.3389/fgene.2021.705038
work_keys_str_mv AT liuguoqing epigeneticmarksandvariationofsequencebasedinformationalonggenomicregionsarepredictiveofrecombinationhotcoldspotsinsaccharomycescerevisiae
AT songshuangjian epigeneticmarksandvariationofsequencebasedinformationalonggenomicregionsarepredictiveofrecombinationhotcoldspotsinsaccharomycescerevisiae
AT zhangqiguo epigeneticmarksandvariationofsequencebasedinformationalonggenomicregionsarepredictiveofrecombinationhotcoldspotsinsaccharomycescerevisiae
AT dongbiyu epigeneticmarksandvariationofsequencebasedinformationalonggenomicregionsarepredictiveofrecombinationhotcoldspotsinsaccharomycescerevisiae
AT sunyu epigeneticmarksandvariationofsequencebasedinformationalonggenomicregionsarepredictiveofrecombinationhotcoldspotsinsaccharomycescerevisiae
AT liuguojun epigeneticmarksandvariationofsequencebasedinformationalonggenomicregionsarepredictiveofrecombinationhotcoldspotsinsaccharomycescerevisiae
AT zhaoxiujuan epigeneticmarksandvariationofsequencebasedinformationalonggenomicregionsarepredictiveofrecombinationhotcoldspotsinsaccharomycescerevisiae