Cargando…
Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM
BACKGROUND: Identification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated met...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4289199/ https://www.ncbi.nlm.nih.gov/pubmed/25409550 http://dx.doi.org/10.1186/1471-2105-15-340 |
_version_ | 1782352064143163392 |
---|---|
author | Li, Liqi Yu, Sanjiu Xiao, Weidong Li, Yongsheng Huang, Lan Zheng, Xiaoqi Zhou, Shiwen Yang, Hua |
author_facet | Li, Liqi Yu, Sanjiu Xiao, Weidong Li, Yongsheng Huang, Lan Zheng, Xiaoqi Zhou, Shiwen Yang, Hua |
author_sort | Li, Liqi |
collection | PubMed |
description | BACKGROUND: Identification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated method for reliably and quickly identifying recombination spots is thus urgently needed. RESULTS: Here we proposed a novel approach by fusing features from pseudo nucleic acid composition (PseNAC), including NAC, n-tier NAC and pseudo dinucleotide composition (PseDNC). A recursive feature extraction by linear kernel support vector machine (SVM) was then used to rank the integrated feature vectors and extract optimal features. SVM was adopted for identifying recombination spots based on these optimal features. To evaluate the performance of the proposed method, jackknife cross-validation test was employed on a benchmark dataset. The overall accuracy of this approach was 84.09%, which was higher (from 0.37% to 3.79%) than those of state-of-the-art tools. CONCLUSIONS: Comparison results suggested that linear kernel SVM is a useful vehicle for identifying recombination hot/cold spots. |
format | Online Article Text |
id | pubmed-4289199 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42891992015-01-11 Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM Li, Liqi Yu, Sanjiu Xiao, Weidong Li, Yongsheng Huang, Lan Zheng, Xiaoqi Zhou, Shiwen Yang, Hua BMC Bioinformatics Research Article BACKGROUND: Identification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated method for reliably and quickly identifying recombination spots is thus urgently needed. RESULTS: Here we proposed a novel approach by fusing features from pseudo nucleic acid composition (PseNAC), including NAC, n-tier NAC and pseudo dinucleotide composition (PseDNC). A recursive feature extraction by linear kernel support vector machine (SVM) was then used to rank the integrated feature vectors and extract optimal features. SVM was adopted for identifying recombination spots based on these optimal features. To evaluate the performance of the proposed method, jackknife cross-validation test was employed on a benchmark dataset. The overall accuracy of this approach was 84.09%, which was higher (from 0.37% to 3.79%) than those of state-of-the-art tools. CONCLUSIONS: Comparison results suggested that linear kernel SVM is a useful vehicle for identifying recombination hot/cold spots. BioMed Central 2014-11-20 /pmc/articles/PMC4289199/ /pubmed/25409550 http://dx.doi.org/10.1186/1471-2105-15-340 Text en © Li et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Li, Liqi Yu, Sanjiu Xiao, Weidong Li, Yongsheng Huang, Lan Zheng, Xiaoqi Zhou, Shiwen Yang, Hua Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM |
title | Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM |
title_full | Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM |
title_fullStr | Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM |
title_full_unstemmed | Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM |
title_short | Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM |
title_sort | sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel svm |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4289199/ https://www.ncbi.nlm.nih.gov/pubmed/25409550 http://dx.doi.org/10.1186/1471-2105-15-340 |
work_keys_str_mv | AT liliqi sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm AT yusanjiu sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm AT xiaoweidong sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm AT liyongsheng sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm AT huanglan sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm AT zhengxiaoqi sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm AT zhoushiwen sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm AT yanghua sequencebasedidentificationofrecombinationspotsusingpseudonucleicacidrepresentationandrecursivefeatureextractionbylinearkernelsvm |