Cargando…
matK-QR classifier: a patterns based approach for plant species identification
BACKGROUND: DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK (matK) and ribulose-1, 5-bisphosphate carboxylase (rbcL) marker loci a...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5148893/ https://www.ncbi.nlm.nih.gov/pubmed/27990177 http://dx.doi.org/10.1186/s13040-016-0120-6 |
_version_ | 1782473904167583744 |
---|---|
author | More, Ravi Prabhakar Mane, Rupali Chandrashekhar Purohit, Hemant J. |
author_facet | More, Ravi Prabhakar Mane, Rupali Chandrashekhar Purohit, Hemant J. |
author_sort | More, Ravi Prabhakar |
collection | PubMed |
description | BACKGROUND: DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK (matK) and ribulose-1, 5-bisphosphate carboxylase (rbcL) marker loci are commonly used in plant species identification. Here, we present a new and highly efficient approach for identifying a unique set of discriminating nucleotide patterns to generate a signature (i.e. regular expression) for plant species identification. METHODS: In order to generate molecular signatures, we used matK and rbcL loci datasets, which encompass 125 plant species in 52 genera reported by the CBOL plant working group. Initially, we performed Multiple Sequence Alignment (MSA) of all species followed by Position Specific Scoring Matrix (PSSM) for both loci to achieve a percentage of discrimination among species. Further, we detected Discriminating Patterns (DP) at genus and species level using PSSM for the matK dataset. Combining DP and consecutive pattern distances, we generated molecular signatures for each species. Finally, we performed a comparative assessment of these signatures with the existing methods including BLASTn, Support Vector Machines (SVM), Jrip-RIPPER, J48 (C4.5 algorithm), and the Naïve Bayes (NB) methods against NCBI-GenBank matK dataset. RESULTS: Due to the higher discrimination success obtained with the matK as compared to the rbcL, we selected matK gene for signature generation. We generated signatures for 60 species based on identified discriminating patterns at genus and species level. Our comparative assessment results suggest that a total of 46 out of 60 species could be correctly identified using generated signatures, followed by BLASTn (34 species), SVM (18 species), C4.5 (7 species), NB (4 species) and RIPPER (3 species) methods As a final outcome of this study, we converted signatures into QR codes and developed a software matK-QR Classifier (http://www.neeri.res.in/matk_classifier/index.htm), which search signatures in the query matK gene sequences and predict corresponding plant species. CONCLUSIONS: This novel approach of employing pattern-based signatures opens new avenues for the classification of species. In addition to existing methods, we believe that matK-QR Classifier would be a valuable tool for molecular taxonomists enabling precise identification of plant species. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-016-0120-6) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5148893 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51488932016-12-16 matK-QR classifier: a patterns based approach for plant species identification More, Ravi Prabhakar Mane, Rupali Chandrashekhar Purohit, Hemant J. BioData Min Research BACKGROUND: DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK (matK) and ribulose-1, 5-bisphosphate carboxylase (rbcL) marker loci are commonly used in plant species identification. Here, we present a new and highly efficient approach for identifying a unique set of discriminating nucleotide patterns to generate a signature (i.e. regular expression) for plant species identification. METHODS: In order to generate molecular signatures, we used matK and rbcL loci datasets, which encompass 125 plant species in 52 genera reported by the CBOL plant working group. Initially, we performed Multiple Sequence Alignment (MSA) of all species followed by Position Specific Scoring Matrix (PSSM) for both loci to achieve a percentage of discrimination among species. Further, we detected Discriminating Patterns (DP) at genus and species level using PSSM for the matK dataset. Combining DP and consecutive pattern distances, we generated molecular signatures for each species. Finally, we performed a comparative assessment of these signatures with the existing methods including BLASTn, Support Vector Machines (SVM), Jrip-RIPPER, J48 (C4.5 algorithm), and the Naïve Bayes (NB) methods against NCBI-GenBank matK dataset. RESULTS: Due to the higher discrimination success obtained with the matK as compared to the rbcL, we selected matK gene for signature generation. We generated signatures for 60 species based on identified discriminating patterns at genus and species level. Our comparative assessment results suggest that a total of 46 out of 60 species could be correctly identified using generated signatures, followed by BLASTn (34 species), SVM (18 species), C4.5 (7 species), NB (4 species) and RIPPER (3 species) methods As a final outcome of this study, we converted signatures into QR codes and developed a software matK-QR Classifier (http://www.neeri.res.in/matk_classifier/index.htm), which search signatures in the query matK gene sequences and predict corresponding plant species. CONCLUSIONS: This novel approach of employing pattern-based signatures opens new avenues for the classification of species. In addition to existing methods, we believe that matK-QR Classifier would be a valuable tool for molecular taxonomists enabling precise identification of plant species. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-016-0120-6) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-09 /pmc/articles/PMC5148893/ /pubmed/27990177 http://dx.doi.org/10.1186/s13040-016-0120-6 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research More, Ravi Prabhakar Mane, Rupali Chandrashekhar Purohit, Hemant J. matK-QR classifier: a patterns based approach for plant species identification |
title | matK-QR classifier: a patterns based approach for plant species identification |
title_full | matK-QR classifier: a patterns based approach for plant species identification |
title_fullStr | matK-QR classifier: a patterns based approach for plant species identification |
title_full_unstemmed | matK-QR classifier: a patterns based approach for plant species identification |
title_short | matK-QR classifier: a patterns based approach for plant species identification |
title_sort | matk-qr classifier: a patterns based approach for plant species identification |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5148893/ https://www.ncbi.nlm.nih.gov/pubmed/27990177 http://dx.doi.org/10.1186/s13040-016-0120-6 |
work_keys_str_mv | AT moreraviprabhakar matkqrclassifierapatternsbasedapproachforplantspeciesidentification AT manerupalichandrashekhar matkqrclassifierapatternsbasedapproachforplantspeciesidentification AT purohithemantj matkqrclassifierapatternsbasedapproachforplantspeciesidentification |