Cargando…

Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging

DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely e...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Cheng-Hong, Wu, Kuo-Chuan, Chuang, Li-Yeh, Chang, Hsueh-Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5846911/
https://www.ncbi.nlm.nih.gov/pubmed/29551885
http://dx.doi.org/10.1177/1176934318760856
_version_ 1783305655099064320
author Yang, Cheng-Hong
Wu, Kuo-Chuan
Chuang, Li-Yeh
Chang, Hsueh-Wei
author_facet Yang, Cheng-Hong
Wu, Kuo-Chuan
Chuang, Li-Yeh
Chang, Hsueh-Wei
author_sort Yang, Cheng-Hong
collection PubMed
description DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely emphasized currently. Single-nucleotide polymorphism (SNP) association studies provide us an idea that the SNPs may be the ideal target of feature selection to discriminate between different species. We hypothesize that SNP-based barcodes may be more effective than the full length of DNA barcode sequences for species discrimination. To address this issue, we tested a ribulose diphosphate carboxylase (rbcL) SNP barcoding (RSB) strategy using a decision tree algorithm. After alignment and trimming, 31 SNPs were discovered in the rbcL sequences from 38 Brassicaceae plant species. In the decision tree construction, these SNPs were computed to set up the decision rule to assign the sequences into 2 groups level by level. After algorithm processing, 37 nodes and 31 loci were required for discriminating 38 species. Finally, the sequence tags consisting of 31 rbcL SNP barcodes were identified for discriminating 38 Brassicaceae species based on the decision tree–selected SNP pattern using RSB method. Taken together, this study provides the rational that the SNP aspect of DNA barcode for rbcL gene is a useful and effective sequence for tagging 38 Brassicaceae species.
format Online
Article
Text
id pubmed-5846911
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-58469112018-03-16 Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging Yang, Cheng-Hong Wu, Kuo-Chuan Chuang, Li-Yeh Chang, Hsueh-Wei Evol Bioinform Online Original Research DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely emphasized currently. Single-nucleotide polymorphism (SNP) association studies provide us an idea that the SNPs may be the ideal target of feature selection to discriminate between different species. We hypothesize that SNP-based barcodes may be more effective than the full length of DNA barcode sequences for species discrimination. To address this issue, we tested a ribulose diphosphate carboxylase (rbcL) SNP barcoding (RSB) strategy using a decision tree algorithm. After alignment and trimming, 31 SNPs were discovered in the rbcL sequences from 38 Brassicaceae plant species. In the decision tree construction, these SNPs were computed to set up the decision rule to assign the sequences into 2 groups level by level. After algorithm processing, 37 nodes and 31 loci were required for discriminating 38 species. Finally, the sequence tags consisting of 31 rbcL SNP barcodes were identified for discriminating 38 Brassicaceae species based on the decision tree–selected SNP pattern using RSB method. Taken together, this study provides the rational that the SNP aspect of DNA barcode for rbcL gene is a useful and effective sequence for tagging 38 Brassicaceae species. SAGE Publications 2018-03-05 /pmc/articles/PMC5846911/ /pubmed/29551885 http://dx.doi.org/10.1177/1176934318760856 Text en © The Author(s) 2018 http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Original Research
Yang, Cheng-Hong
Wu, Kuo-Chuan
Chuang, Li-Yeh
Chang, Hsueh-Wei
Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging
title Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging
title_full Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging
title_fullStr Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging
title_full_unstemmed Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging
title_short Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging
title_sort decision tree algorithm–generated single-nucleotide polymorphism barcodes of rbcl genes for 38 brassicaceae species tagging
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5846911/
https://www.ncbi.nlm.nih.gov/pubmed/29551885
http://dx.doi.org/10.1177/1176934318760856
work_keys_str_mv AT yangchenghong decisiontreealgorithmgeneratedsinglenucleotidepolymorphismbarcodesofrbclgenesfor38brassicaceaespeciestagging
AT wukuochuan decisiontreealgorithmgeneratedsinglenucleotidepolymorphismbarcodesofrbclgenesfor38brassicaceaespeciestagging
AT chuangliyeh decisiontreealgorithmgeneratedsinglenucleotidepolymorphismbarcodesofrbclgenesfor38brassicaceaespeciestagging
AT changhsuehwei decisiontreealgorithmgeneratedsinglenucleotidepolymorphismbarcodesofrbclgenesfor38brassicaceaespeciestagging