Cargando…

Recombination spot identification Based on gapped k-mers

Recombination is crucial for biological evolution, which provides many new combinations of genetic diversity. Accurate identification of recombination spots is useful for DNA function study. To improve the prediction accuracy, researchers have proposed several computational methods for recombination...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Rong, Xu, Yong, Liu, Bin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4814916/
https://www.ncbi.nlm.nih.gov/pubmed/27030570
http://dx.doi.org/10.1038/srep23934
_version_ 1782424506140196864
author Wang, Rong
Xu, Yong
Liu, Bin
author_facet Wang, Rong
Xu, Yong
Liu, Bin
author_sort Wang, Rong
collection PubMed
description Recombination is crucial for biological evolution, which provides many new combinations of genetic diversity. Accurate identification of recombination spots is useful for DNA function study. To improve the prediction accuracy, researchers have proposed several computational methods for recombination spot identification. The k-mer feature is one of the most useful features for modeling the properties and function of DNA sequences. However, it suffers from the inherent limitation. If the value of word length k is large, the occurrences of k-mers are closed to a binary variable, with a few k-mers present once and most k-mers are absent. This usually causes the sparse problem and reduces the classification accuracy. To solve this problem, we add gaps into k-mer and introduce a new feature called gapped k-mer (GKM) for identification of recombination spots. By using this feature, we present a new predictor called SVM-GKM, which combines the gapped k-mers and Support Vector Machine (SVM) for recombination spot identification. Experimental results on a widely used benchmark dataset show that SVM-GKM outperforms other highly related predictors. Therefore, SVM-GKM would be a powerful predictor for computational genomics.
format Online
Article
Text
id pubmed-4814916
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-48149162016-04-04 Recombination spot identification Based on gapped k-mers Wang, Rong Xu, Yong Liu, Bin Sci Rep Article Recombination is crucial for biological evolution, which provides many new combinations of genetic diversity. Accurate identification of recombination spots is useful for DNA function study. To improve the prediction accuracy, researchers have proposed several computational methods for recombination spot identification. The k-mer feature is one of the most useful features for modeling the properties and function of DNA sequences. However, it suffers from the inherent limitation. If the value of word length k is large, the occurrences of k-mers are closed to a binary variable, with a few k-mers present once and most k-mers are absent. This usually causes the sparse problem and reduces the classification accuracy. To solve this problem, we add gaps into k-mer and introduce a new feature called gapped k-mer (GKM) for identification of recombination spots. By using this feature, we present a new predictor called SVM-GKM, which combines the gapped k-mers and Support Vector Machine (SVM) for recombination spot identification. Experimental results on a widely used benchmark dataset show that SVM-GKM outperforms other highly related predictors. Therefore, SVM-GKM would be a powerful predictor for computational genomics. Nature Publishing Group 2016-03-31 /pmc/articles/PMC4814916/ /pubmed/27030570 http://dx.doi.org/10.1038/srep23934 Text en Copyright © 2016, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Wang, Rong
Xu, Yong
Liu, Bin
Recombination spot identification Based on gapped k-mers
title Recombination spot identification Based on gapped k-mers
title_full Recombination spot identification Based on gapped k-mers
title_fullStr Recombination spot identification Based on gapped k-mers
title_full_unstemmed Recombination spot identification Based on gapped k-mers
title_short Recombination spot identification Based on gapped k-mers
title_sort recombination spot identification based on gapped k-mers
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4814916/
https://www.ncbi.nlm.nih.gov/pubmed/27030570
http://dx.doi.org/10.1038/srep23934
work_keys_str_mv AT wangrong recombinationspotidentificationbasedongappedkmers
AT xuyong recombinationspotidentificationbasedongappedkmers
AT liubin recombinationspotidentificationbasedongappedkmers