Cargando…

Clasnip: a web-based intraspecies classifier and multi-locus sequence typing for pathogenic microorganisms using fragmented sequences

Bioinformatic approaches for the identification of microorganisms have evolved rapidly, but existing methods are time-consuming, complicated or expensive for massive screening of pathogens and their non-pathogenic relatives. Also, bioinformatic classifiers usually lack automatically generated perfor...

Descripción completa

Detalles Bibliográficos
Autores principales: Chuan, Jiacheng, Xu, Huimin, Hammill, Desmond L., Hale, Lawrence, Chen, Wen, Li, Xiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9835710/
https://www.ncbi.nlm.nih.gov/pubmed/36643626
http://dx.doi.org/10.7717/peerj.14490
_version_ 1784868723622936576
author Chuan, Jiacheng
Xu, Huimin
Hammill, Desmond L.
Hale, Lawrence
Chen, Wen
Li, Xiang
author_facet Chuan, Jiacheng
Xu, Huimin
Hammill, Desmond L.
Hale, Lawrence
Chen, Wen
Li, Xiang
author_sort Chuan, Jiacheng
collection PubMed
description Bioinformatic approaches for the identification of microorganisms have evolved rapidly, but existing methods are time-consuming, complicated or expensive for massive screening of pathogens and their non-pathogenic relatives. Also, bioinformatic classifiers usually lack automatically generated performance statistics for specific databases. To address this problem, we developed Clasnip (www.clasnip.com), an easy-to-use web-based platform for the classification and similarity evaluation of closely related microorganisms at interspecies and intraspecies levels. Clasnip mainly consists of two modules: database building and sample classification. In database building, labeled nucleotide sequences are mapped to a reference sequence, and then single nucleotide polymorphisms (SNPs) statistics are generated. A probability model of SNPs and classification groups is built using Hidden Markov Models and solved using the maximum likelihood method. Database performance is estimated using three replicates of two-fold cross-validation. Sensitivity (recall), specificity (selectivity), precision, accuracy and other metrics are computed for all samples, training sets, and test sets. In sample classification, Clasnip accepts inputs of genes, short fragments, contigs and even whole genomes. It can report classification probability and a multi-locus sequence typing table for SNPs. The classification performance was tested using short sequences of 16S, 16–23S and 50S rRNA regions for 12 haplotypes of Candidatus Liberibacter solanacearum (CLso), a regulated plant pathogen associated with severe disease in economically important Apiaceous and Solanaceous crops. The program was able to classify CLso samples with even only 1–2 SNPs available, and achieved 97.2%, 98.8% and 100.0% accuracy based on 16S, 16–23S, and 50S rRNA sequences, respectively. In comparison with all existing 12 haplotypes, we proposed that to be classified as a new haplotype, given samples have at least 2 SNPs in the combined region of 16S rRNA (OA2/Lsc2) and 16–23S IGS (Lp Frag 4–1611F/Lp Frag 4–480R) regions, and 2 SNPs in the 50S rplJ/rplL (CL514F/CL514R) regions. Besides, we have included the databases for differentiating Dickeya spp., Pectobacterium spp. and Clavibacter spp. In addition to bacteria, we also tested Clasnip performance on potato virus Y (PVY). 251 PVY genomes were 100% correctly classified into seven groups (PVY(C), PVY(N), PVY(O), PVY(NTN), PVY(N:O), Poha, and Chile3). In conclusion, Clasnip is a statistically sound and user-friendly bioinformatic application for microorganism classification at the intraspecies level. Clasnip service is freely available at www.clasnip.com.
format Online
Article
Text
id pubmed-9835710
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-98357102023-01-13 Clasnip: a web-based intraspecies classifier and multi-locus sequence typing for pathogenic microorganisms using fragmented sequences Chuan, Jiacheng Xu, Huimin Hammill, Desmond L. Hale, Lawrence Chen, Wen Li, Xiang PeerJ Agricultural Science Bioinformatic approaches for the identification of microorganisms have evolved rapidly, but existing methods are time-consuming, complicated or expensive for massive screening of pathogens and their non-pathogenic relatives. Also, bioinformatic classifiers usually lack automatically generated performance statistics for specific databases. To address this problem, we developed Clasnip (www.clasnip.com), an easy-to-use web-based platform for the classification and similarity evaluation of closely related microorganisms at interspecies and intraspecies levels. Clasnip mainly consists of two modules: database building and sample classification. In database building, labeled nucleotide sequences are mapped to a reference sequence, and then single nucleotide polymorphisms (SNPs) statistics are generated. A probability model of SNPs and classification groups is built using Hidden Markov Models and solved using the maximum likelihood method. Database performance is estimated using three replicates of two-fold cross-validation. Sensitivity (recall), specificity (selectivity), precision, accuracy and other metrics are computed for all samples, training sets, and test sets. In sample classification, Clasnip accepts inputs of genes, short fragments, contigs and even whole genomes. It can report classification probability and a multi-locus sequence typing table for SNPs. The classification performance was tested using short sequences of 16S, 16–23S and 50S rRNA regions for 12 haplotypes of Candidatus Liberibacter solanacearum (CLso), a regulated plant pathogen associated with severe disease in economically important Apiaceous and Solanaceous crops. The program was able to classify CLso samples with even only 1–2 SNPs available, and achieved 97.2%, 98.8% and 100.0% accuracy based on 16S, 16–23S, and 50S rRNA sequences, respectively. In comparison with all existing 12 haplotypes, we proposed that to be classified as a new haplotype, given samples have at least 2 SNPs in the combined region of 16S rRNA (OA2/Lsc2) and 16–23S IGS (Lp Frag 4–1611F/Lp Frag 4–480R) regions, and 2 SNPs in the 50S rplJ/rplL (CL514F/CL514R) regions. Besides, we have included the databases for differentiating Dickeya spp., Pectobacterium spp. and Clavibacter spp. In addition to bacteria, we also tested Clasnip performance on potato virus Y (PVY). 251 PVY genomes were 100% correctly classified into seven groups (PVY(C), PVY(N), PVY(O), PVY(NTN), PVY(N:O), Poha, and Chile3). In conclusion, Clasnip is a statistically sound and user-friendly bioinformatic application for microorganism classification at the intraspecies level. Clasnip service is freely available at www.clasnip.com. PeerJ Inc. 2023-01-09 /pmc/articles/PMC9835710/ /pubmed/36643626 http://dx.doi.org/10.7717/peerj.14490 Text en © 2023 Chuan et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Agricultural Science
Chuan, Jiacheng
Xu, Huimin
Hammill, Desmond L.
Hale, Lawrence
Chen, Wen
Li, Xiang
Clasnip: a web-based intraspecies classifier and multi-locus sequence typing for pathogenic microorganisms using fragmented sequences
title Clasnip: a web-based intraspecies classifier and multi-locus sequence typing for pathogenic microorganisms using fragmented sequences
title_full Clasnip: a web-based intraspecies classifier and multi-locus sequence typing for pathogenic microorganisms using fragmented sequences
title_fullStr Clasnip: a web-based intraspecies classifier and multi-locus sequence typing for pathogenic microorganisms using fragmented sequences
title_full_unstemmed Clasnip: a web-based intraspecies classifier and multi-locus sequence typing for pathogenic microorganisms using fragmented sequences
title_short Clasnip: a web-based intraspecies classifier and multi-locus sequence typing for pathogenic microorganisms using fragmented sequences
title_sort clasnip: a web-based intraspecies classifier and multi-locus sequence typing for pathogenic microorganisms using fragmented sequences
topic Agricultural Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9835710/
https://www.ncbi.nlm.nih.gov/pubmed/36643626
http://dx.doi.org/10.7717/peerj.14490
work_keys_str_mv AT chuanjiacheng clasnipawebbasedintraspeciesclassifierandmultilocussequencetypingforpathogenicmicroorganismsusingfragmentedsequences
AT xuhuimin clasnipawebbasedintraspeciesclassifierandmultilocussequencetypingforpathogenicmicroorganismsusingfragmentedsequences
AT hammilldesmondl clasnipawebbasedintraspeciesclassifierandmultilocussequencetypingforpathogenicmicroorganismsusingfragmentedsequences
AT halelawrence clasnipawebbasedintraspeciesclassifierandmultilocussequencetypingforpathogenicmicroorganismsusingfragmentedsequences
AT chenwen clasnipawebbasedintraspeciesclassifierandmultilocussequencetypingforpathogenicmicroorganismsusingfragmentedsequences
AT lixiang clasnipawebbasedintraspeciesclassifierandmultilocussequencetypingforpathogenicmicroorganismsusingfragmentedsequences