Cargando…

matK-QR classifier: a patterns based approach for plant species identification

BACKGROUND: DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK (matK) and ribulose-1, 5-bisphosphate carboxylase (rbcL) marker loci a...

Descripción completa

Detalles Bibliográficos
Autores principales: More, Ravi Prabhakar, Mane, Rupali Chandrashekhar, Purohit, Hemant J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5148893/
https://www.ncbi.nlm.nih.gov/pubmed/27990177
http://dx.doi.org/10.1186/s13040-016-0120-6
_version_ 1782473904167583744
author More, Ravi Prabhakar
Mane, Rupali Chandrashekhar
Purohit, Hemant J.
author_facet More, Ravi Prabhakar
Mane, Rupali Chandrashekhar
Purohit, Hemant J.
author_sort More, Ravi Prabhakar
collection PubMed
description BACKGROUND: DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK (matK) and ribulose-1, 5-bisphosphate carboxylase (rbcL) marker loci are commonly used in plant species identification. Here, we present a new and highly efficient approach for identifying a unique set of discriminating nucleotide patterns to generate a signature (i.e. regular expression) for plant species identification. METHODS: In order to generate molecular signatures, we used matK and rbcL loci datasets, which encompass 125 plant species in 52 genera reported by the CBOL plant working group. Initially, we performed Multiple Sequence Alignment (MSA) of all species followed by Position Specific Scoring Matrix (PSSM) for both loci to achieve a percentage of discrimination among species. Further, we detected Discriminating Patterns (DP) at genus and species level using PSSM for the matK dataset. Combining DP and consecutive pattern distances, we generated molecular signatures for each species. Finally, we performed a comparative assessment of these signatures with the existing methods including BLASTn, Support Vector Machines (SVM), Jrip-RIPPER, J48 (C4.5 algorithm), and the Naïve Bayes (NB) methods against NCBI-GenBank matK dataset. RESULTS: Due to the higher discrimination success obtained with the matK as compared to the rbcL, we selected matK gene for signature generation. We generated signatures for 60 species based on identified discriminating patterns at genus and species level. Our comparative assessment results suggest that a total of 46 out of 60 species could be correctly identified using generated signatures, followed by BLASTn (34 species), SVM (18 species), C4.5 (7 species), NB (4 species) and RIPPER (3 species) methods As a final outcome of this study, we converted signatures into QR codes and developed a software matK-QR Classifier (http://www.neeri.res.in/matk_classifier/index.htm), which search signatures in the query matK gene sequences and predict corresponding plant species. CONCLUSIONS: This novel approach of employing pattern-based signatures opens new avenues for the classification of species. In addition to existing methods, we believe that matK-QR Classifier would be a valuable tool for molecular taxonomists enabling precise identification of plant species. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-016-0120-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5148893
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51488932016-12-16 matK-QR classifier: a patterns based approach for plant species identification More, Ravi Prabhakar Mane, Rupali Chandrashekhar Purohit, Hemant J. BioData Min Research BACKGROUND: DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK (matK) and ribulose-1, 5-bisphosphate carboxylase (rbcL) marker loci are commonly used in plant species identification. Here, we present a new and highly efficient approach for identifying a unique set of discriminating nucleotide patterns to generate a signature (i.e. regular expression) for plant species identification. METHODS: In order to generate molecular signatures, we used matK and rbcL loci datasets, which encompass 125 plant species in 52 genera reported by the CBOL plant working group. Initially, we performed Multiple Sequence Alignment (MSA) of all species followed by Position Specific Scoring Matrix (PSSM) for both loci to achieve a percentage of discrimination among species. Further, we detected Discriminating Patterns (DP) at genus and species level using PSSM for the matK dataset. Combining DP and consecutive pattern distances, we generated molecular signatures for each species. Finally, we performed a comparative assessment of these signatures with the existing methods including BLASTn, Support Vector Machines (SVM), Jrip-RIPPER, J48 (C4.5 algorithm), and the Naïve Bayes (NB) methods against NCBI-GenBank matK dataset. RESULTS: Due to the higher discrimination success obtained with the matK as compared to the rbcL, we selected matK gene for signature generation. We generated signatures for 60 species based on identified discriminating patterns at genus and species level. Our comparative assessment results suggest that a total of 46 out of 60 species could be correctly identified using generated signatures, followed by BLASTn (34 species), SVM (18 species), C4.5 (7 species), NB (4 species) and RIPPER (3 species) methods As a final outcome of this study, we converted signatures into QR codes and developed a software matK-QR Classifier (http://www.neeri.res.in/matk_classifier/index.htm), which search signatures in the query matK gene sequences and predict corresponding plant species. CONCLUSIONS: This novel approach of employing pattern-based signatures opens new avenues for the classification of species. In addition to existing methods, we believe that matK-QR Classifier would be a valuable tool for molecular taxonomists enabling precise identification of plant species. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-016-0120-6) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-09 /pmc/articles/PMC5148893/ /pubmed/27990177 http://dx.doi.org/10.1186/s13040-016-0120-6 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
More, Ravi Prabhakar
Mane, Rupali Chandrashekhar
Purohit, Hemant J.
matK-QR classifier: a patterns based approach for plant species identification
title matK-QR classifier: a patterns based approach for plant species identification
title_full matK-QR classifier: a patterns based approach for plant species identification
title_fullStr matK-QR classifier: a patterns based approach for plant species identification
title_full_unstemmed matK-QR classifier: a patterns based approach for plant species identification
title_short matK-QR classifier: a patterns based approach for plant species identification
title_sort matk-qr classifier: a patterns based approach for plant species identification
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5148893/
https://www.ncbi.nlm.nih.gov/pubmed/27990177
http://dx.doi.org/10.1186/s13040-016-0120-6
work_keys_str_mv AT moreraviprabhakar matkqrclassifierapatternsbasedapproachforplantspeciesidentification
AT manerupalichandrashekhar matkqrclassifierapatternsbasedapproachforplantspeciesidentification
AT purohithemantj matkqrclassifierapatternsbasedapproachforplantspeciesidentification