Cargando…

Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM

BACKGROUND: Identification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated met...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Liqi, Yu, Sanjiu, Xiao, Weidong, Li, Yongsheng, Huang, Lan, Zheng, Xiaoqi, Zhou, Shiwen, Yang, Hua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4289199/
https://www.ncbi.nlm.nih.gov/pubmed/25409550
http://dx.doi.org/10.1186/1471-2105-15-340
_version_ 1782352064143163392
author Li, Liqi
Yu, Sanjiu
Xiao, Weidong
Li, Yongsheng
Huang, Lan
Zheng, Xiaoqi
Zhou, Shiwen
Yang, Hua
author_facet Li, Liqi
Yu, Sanjiu
Xiao, Weidong
Li, Yongsheng
Huang, Lan
Zheng, Xiaoqi
Zhou, Shiwen
Yang, Hua
author_sort Li, Liqi
collection PubMed
description BACKGROUND: Identification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated method for reliably and quickly identifying recombination spots is thus urgently needed. RESULTS: Here we proposed a novel approach by fusing features from pseudo nucleic acid composition (PseNAC), including NAC, n-tier NAC and pseudo dinucleotide composition (PseDNC). A recursive feature extraction by linear kernel support vector machine (SVM) was then used to rank the integrated feature vectors and extract optimal features. SVM was adopted for identifying recombination spots based on these optimal features. To evaluate the performance of the proposed method, jackknife cross-validation test was employed on a benchmark dataset. The overall accuracy of this approach was 84.09%, which was higher (from 0.37% to 3.79%) than those of state-of-the-art tools. CONCLUSIONS: Comparison results suggested that linear kernel SVM is a useful vehicle for identifying recombination hot/cold spots.
format Online
Article
Text
id pubmed-4289199
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42891992015-01-11 Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM Li, Liqi Yu, Sanjiu Xiao, Weidong Li, Yongsheng Huang, Lan Zheng, Xiaoqi Zhou, Shiwen Yang, Hua BMC Bioinformatics Research Article BACKGROUND: Identification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated method for reliably and quickly identifying recombination spots is thus urgently needed. RESULTS: Here we proposed a novel approach by fusing features from pseudo nucleic acid composition (PseNAC), including NAC, n-tier NAC and pseudo dinucleotide composition (PseDNC). A recursive feature extraction by linear kernel support vector machine (SVM) was then used to rank the integrated feature vectors and extract optimal features. SVM was adopted for identifying recombination spots based on these optimal features. To evaluate the performance of the proposed method, jackknife cross-validation test was employed on a benchmark dataset. The overall accuracy of this approach was 84.09%, which was higher (from 0.37% to 3.79%) than those of state-of-the-art tools. CONCLUSIONS: Comparison results suggested that linear kernel SVM is a useful vehicle for identifying recombination hot/cold spots. BioMed Central 2014-11-20 /pmc/articles/PMC4289199/ /pubmed/25409550 http://dx.doi.org/10.1186/1471-2105-15-340 Text en © Li et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Li, Liqi
Yu, Sanjiu
Xiao, Weidong
Li, Yongsheng
Huang, Lan
Zheng, Xiaoqi
Zhou, Shiwen
Yang, Hua
Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM
title Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM
title_full Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM
title_fullStr Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM
title_full_unstemmed Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM
title_short Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM
title_sort sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel svm
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4289199/
https://www.ncbi.nlm.nih.gov/pubmed/25409550
http://dx.doi.org/10.1186/1471-2105-15-340
work_keys_str_mv AT liliqi sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm
AT yusanjiu sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm
AT xiaoweidong sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm
AT liyongsheng sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm
AT huanglan sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm
AT zhengxiaoqi sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm
AT zhoushiwen sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm
AT yanghua sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm