Cargando…
Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging
DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely e...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
SAGE Publications
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5846911/ https://www.ncbi.nlm.nih.gov/pubmed/29551885 http://dx.doi.org/10.1177/1176934318760856 |
_version_ | 1783305655099064320 |
---|---|
author | Yang, Cheng-Hong Wu, Kuo-Chuan Chuang, Li-Yeh Chang, Hsueh-Wei |
author_facet | Yang, Cheng-Hong Wu, Kuo-Chuan Chuang, Li-Yeh Chang, Hsueh-Wei |
author_sort | Yang, Cheng-Hong |
collection | PubMed |
description | DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely emphasized currently. Single-nucleotide polymorphism (SNP) association studies provide us an idea that the SNPs may be the ideal target of feature selection to discriminate between different species. We hypothesize that SNP-based barcodes may be more effective than the full length of DNA barcode sequences for species discrimination. To address this issue, we tested a ribulose diphosphate carboxylase (rbcL) SNP barcoding (RSB) strategy using a decision tree algorithm. After alignment and trimming, 31 SNPs were discovered in the rbcL sequences from 38 Brassicaceae plant species. In the decision tree construction, these SNPs were computed to set up the decision rule to assign the sequences into 2 groups level by level. After algorithm processing, 37 nodes and 31 loci were required for discriminating 38 species. Finally, the sequence tags consisting of 31 rbcL SNP barcodes were identified for discriminating 38 Brassicaceae species based on the decision tree–selected SNP pattern using RSB method. Taken together, this study provides the rational that the SNP aspect of DNA barcode for rbcL gene is a useful and effective sequence for tagging 38 Brassicaceae species. |
format | Online Article Text |
id | pubmed-5846911 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | SAGE Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-58469112018-03-16 Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging Yang, Cheng-Hong Wu, Kuo-Chuan Chuang, Li-Yeh Chang, Hsueh-Wei Evol Bioinform Online Original Research DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely emphasized currently. Single-nucleotide polymorphism (SNP) association studies provide us an idea that the SNPs may be the ideal target of feature selection to discriminate between different species. We hypothesize that SNP-based barcodes may be more effective than the full length of DNA barcode sequences for species discrimination. To address this issue, we tested a ribulose diphosphate carboxylase (rbcL) SNP barcoding (RSB) strategy using a decision tree algorithm. After alignment and trimming, 31 SNPs were discovered in the rbcL sequences from 38 Brassicaceae plant species. In the decision tree construction, these SNPs were computed to set up the decision rule to assign the sequences into 2 groups level by level. After algorithm processing, 37 nodes and 31 loci were required for discriminating 38 species. Finally, the sequence tags consisting of 31 rbcL SNP barcodes were identified for discriminating 38 Brassicaceae species based on the decision tree–selected SNP pattern using RSB method. Taken together, this study provides the rational that the SNP aspect of DNA barcode for rbcL gene is a useful and effective sequence for tagging 38 Brassicaceae species. SAGE Publications 2018-03-05 /pmc/articles/PMC5846911/ /pubmed/29551885 http://dx.doi.org/10.1177/1176934318760856 Text en © The Author(s) 2018 http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). |
spellingShingle | Original Research Yang, Cheng-Hong Wu, Kuo-Chuan Chuang, Li-Yeh Chang, Hsueh-Wei Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging |
title | Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging |
title_full | Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging |
title_fullStr | Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging |
title_full_unstemmed | Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging |
title_short | Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging |
title_sort | decision tree algorithm–generated single-nucleotide polymorphism barcodes of rbcl genes for 38 brassicaceae species tagging |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5846911/ https://www.ncbi.nlm.nih.gov/pubmed/29551885 http://dx.doi.org/10.1177/1176934318760856 |
work_keys_str_mv | AT yangchenghong decisiontreealgorithmgeneratedsinglenucleotidepolymorphismbarcodesofrbclgenesfor38brassicaceaespeciestagging AT wukuochuan decisiontreealgorithmgeneratedsinglenucleotidepolymorphismbarcodesofrbclgenesfor38brassicaceaespeciestagging AT chuangliyeh decisiontreealgorithmgeneratedsinglenucleotidepolymorphismbarcodesofrbclgenesfor38brassicaceaespeciestagging AT changhsuehwei decisiontreealgorithmgeneratedsinglenucleotidepolymorphismbarcodesofrbclgenesfor38brassicaceaespeciestagging |