Cargando…

InDel marker detection by integration of multiple softwares using machine learning techniques

BACKGROUND: In the biological experiments of soybean species, molecular markers are widely used to verify the soybean genome or construct its genetic map. Among a variety of molecular markers, insertions and deletions (InDels) are preferred with the advantages of wide distribution and high density a...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Jianqiu, Shi, Xinyi, Hu, Lun, Luo, Daipeng, Peng, Jing, Xiong, Shengwu, Kong, Fanjing, Liu, Baohui, Yuan, Xiaohui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6889189/
https://www.ncbi.nlm.nih.gov/pubmed/27806691
http://dx.doi.org/10.1186/s12859-016-1312-2
_version_ 1783475364011442176
author Yang, Jianqiu
Shi, Xinyi
Hu, Lun
Luo, Daipeng
Peng, Jing
Xiong, Shengwu
Kong, Fanjing
Liu, Baohui
Yuan, Xiaohui
author_facet Yang, Jianqiu
Shi, Xinyi
Hu, Lun
Luo, Daipeng
Peng, Jing
Xiong, Shengwu
Kong, Fanjing
Liu, Baohui
Yuan, Xiaohui
author_sort Yang, Jianqiu
collection PubMed
description BACKGROUND: In the biological experiments of soybean species, molecular markers are widely used to verify the soybean genome or construct its genetic map. Among a variety of molecular markers, insertions and deletions (InDels) are preferred with the advantages of wide distribution and high density at the whole-genome level. Hence, the problem of detecting InDels based on next-generation sequencing data is of great importance for the design of InDel markers. To tackle it, this paper integrated machine learning techniques with existing software and developed two algorithms for InDel detection, one is the best F-score method (BF-M) and the other is the Support Vector Machine (SVM) method (SVM-M), which is based on the classical SVM model. RESULTS: The experimental results show that the performance of BF-M was promising as indicated by the high precision and recall scores, whereas SVM-M yielded the best performance in terms of recall and F-score. Moreover, based on the InDel markers detected by SVM-M from soybeans that were collected from 56 different regions, highly polymorphic loci were selected to construct an InDel marker database for soybean. CONCLUSIONS: Compared to existing software tools, the two algorithms proposed in this work produced substantially higher precision and recall scores, and remained stable in various types of genomic regions. Moreover, based on SVM-M, we have constructed a database for soybean InDel markers and published it for academic research.
format Online
Article
Text
id pubmed-6889189
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68891892019-12-11 InDel marker detection by integration of multiple softwares using machine learning techniques Yang, Jianqiu Shi, Xinyi Hu, Lun Luo, Daipeng Peng, Jing Xiong, Shengwu Kong, Fanjing Liu, Baohui Yuan, Xiaohui BMC Bioinformatics Methodology Article BACKGROUND: In the biological experiments of soybean species, molecular markers are widely used to verify the soybean genome or construct its genetic map. Among a variety of molecular markers, insertions and deletions (InDels) are preferred with the advantages of wide distribution and high density at the whole-genome level. Hence, the problem of detecting InDels based on next-generation sequencing data is of great importance for the design of InDel markers. To tackle it, this paper integrated machine learning techniques with existing software and developed two algorithms for InDel detection, one is the best F-score method (BF-M) and the other is the Support Vector Machine (SVM) method (SVM-M), which is based on the classical SVM model. RESULTS: The experimental results show that the performance of BF-M was promising as indicated by the high precision and recall scores, whereas SVM-M yielded the best performance in terms of recall and F-score. Moreover, based on the InDel markers detected by SVM-M from soybeans that were collected from 56 different regions, highly polymorphic loci were selected to construct an InDel marker database for soybean. CONCLUSIONS: Compared to existing software tools, the two algorithms proposed in this work produced substantially higher precision and recall scores, and remained stable in various types of genomic regions. Moreover, based on SVM-M, we have constructed a database for soybean InDel markers and published it for academic research. BioMed Central 2016-11-02 /pmc/articles/PMC6889189/ /pubmed/27806691 http://dx.doi.org/10.1186/s12859-016-1312-2 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Yang, Jianqiu
Shi, Xinyi
Hu, Lun
Luo, Daipeng
Peng, Jing
Xiong, Shengwu
Kong, Fanjing
Liu, Baohui
Yuan, Xiaohui
InDel marker detection by integration of multiple softwares using machine learning techniques
title InDel marker detection by integration of multiple softwares using machine learning techniques
title_full InDel marker detection by integration of multiple softwares using machine learning techniques
title_fullStr InDel marker detection by integration of multiple softwares using machine learning techniques
title_full_unstemmed InDel marker detection by integration of multiple softwares using machine learning techniques
title_short InDel marker detection by integration of multiple softwares using machine learning techniques
title_sort indel marker detection by integration of multiple softwares using machine learning techniques
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6889189/
https://www.ncbi.nlm.nih.gov/pubmed/27806691
http://dx.doi.org/10.1186/s12859-016-1312-2
work_keys_str_mv AT yangjianqiu indelmarkerdetectionbyintegrationofmultiplesoftwaresusingmachinelearningtechniques
AT shixinyi indelmarkerdetectionbyintegrationofmultiplesoftwaresusingmachinelearningtechniques
AT hulun indelmarkerdetectionbyintegrationofmultiplesoftwaresusingmachinelearningtechniques
AT luodaipeng indelmarkerdetectionbyintegrationofmultiplesoftwaresusingmachinelearningtechniques
AT pengjing indelmarkerdetectionbyintegrationofmultiplesoftwaresusingmachinelearningtechniques
AT xiongshengwu indelmarkerdetectionbyintegrationofmultiplesoftwaresusingmachinelearningtechniques
AT kongfanjing indelmarkerdetectionbyintegrationofmultiplesoftwaresusingmachinelearningtechniques
AT liubaohui indelmarkerdetectionbyintegrationofmultiplesoftwaresusingmachinelearningtechniques
AT yuanxiaohui indelmarkerdetectionbyintegrationofmultiplesoftwaresusingmachinelearningtechniques