Cargando…

Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics

Phenotypic characteristics of a plant species refers to its physical properties as cataloged by plant biologists at different research centers around the world. Clustering species based upon their phenotypic characteristics is used to obtain diverse sets of parents that are useful in their breeding...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shastri, Aditya A., Ahuja, Kapil, Ratnaparkhe, Milind B., Busnel, Yann
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2021
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8432307/ https://www.ncbi.nlm.nih.gov/pubmed/34589292 http://dx.doi.org/10.7717/peerj.11927

_version_	1783751132243296256
author	Shastri, Aditya A. Ahuja, Kapil Ratnaparkhe, Milind B. Busnel, Yann
author_facet	Shastri, Aditya A. Ahuja, Kapil Ratnaparkhe, Milind B. Busnel, Yann
author_sort	Shastri, Aditya A.
collection	PubMed
description	Phenotypic characteristics of a plant species refers to its physical properties as cataloged by plant biologists at different research centers around the world. Clustering species based upon their phenotypic characteristics is used to obtain diverse sets of parents that are useful in their breeding programs. The Hierarchical Clustering (HC) algorithm is the current standard in clustering of phenotypic data. This algorithm suffers from low accuracy and high computational complexity issues. To address the accuracy challenge, we propose the use of Spectral Clustering (SC) algorithm. To make the algorithm computationally cheap, we propose using sampling, specifically, Pivotal Sampling that is probability based. Since application of samplings to phenotypic data has not been explored much, for effective comparison, another sampling technique called Vector Quantization (VQ) is adapted for this data as well. VQ has recently generated promising results for genotypic data. The novelty of our SC with Pivotal Sampling algorithm is in constructing the crucial similarity matrix for the clustering algorithm and defining probabilities for the sampling technique. Although our algorithm can be applied to any plant species, we tested it on the phenotypic data obtained from about 2,400 Soybean species. SC with Pivotal Sampling achieves substantially more accuracy (in terms of Silhouette Values) than all the other proposed competitive clustering with sampling algorithms (i.e. SC with VQ, HC with Pivotal Sampling, and HC with VQ). The complexities of our SC with Pivotal Sampling algorithm and these three variants are almost the same because of the involved sampling. In addition to this, SC with Pivotal Sampling outperforms the standard HC algorithm in both accuracy and computational complexity. We experimentally show that we are up to 45% more accurate than HC in terms of clustering accuracy. The computational complexity of our algorithm is more than a magnitude less than that of HC.
format	Online Article Text
id	pubmed-8432307
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-84323072021-09-28 Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics Shastri, Aditya A. Ahuja, Kapil Ratnaparkhe, Milind B. Busnel, Yann PeerJ Bioinformatics Phenotypic characteristics of a plant species refers to its physical properties as cataloged by plant biologists at different research centers around the world. Clustering species based upon their phenotypic characteristics is used to obtain diverse sets of parents that are useful in their breeding programs. The Hierarchical Clustering (HC) algorithm is the current standard in clustering of phenotypic data. This algorithm suffers from low accuracy and high computational complexity issues. To address the accuracy challenge, we propose the use of Spectral Clustering (SC) algorithm. To make the algorithm computationally cheap, we propose using sampling, specifically, Pivotal Sampling that is probability based. Since application of samplings to phenotypic data has not been explored much, for effective comparison, another sampling technique called Vector Quantization (VQ) is adapted for this data as well. VQ has recently generated promising results for genotypic data. The novelty of our SC with Pivotal Sampling algorithm is in constructing the crucial similarity matrix for the clustering algorithm and defining probabilities for the sampling technique. Although our algorithm can be applied to any plant species, we tested it on the phenotypic data obtained from about 2,400 Soybean species. SC with Pivotal Sampling achieves substantially more accuracy (in terms of Silhouette Values) than all the other proposed competitive clustering with sampling algorithms (i.e. SC with VQ, HC with Pivotal Sampling, and HC with VQ). The complexities of our SC with Pivotal Sampling algorithm and these three variants are almost the same because of the involved sampling. In addition to this, SC with Pivotal Sampling outperforms the standard HC algorithm in both accuracy and computational complexity. We experimentally show that we are up to 45% more accurate than HC in terms of clustering accuracy. The computational complexity of our algorithm is more than a magnitude less than that of HC. PeerJ Inc. 2021-09-07 /pmc/articles/PMC8432307/ /pubmed/34589292 http://dx.doi.org/10.7717/peerj.11927 Text en © 2021 Shastri et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Bioinformatics Shastri, Aditya A. Ahuja, Kapil Ratnaparkhe, Milind B. Busnel, Yann Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics
title	Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics
title_full	Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics
title_fullStr	Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics
title_full_unstemmed	Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics
title_short	Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics
title_sort	probabilistically sampled and spectrally clustered plant species using phenotypic characteristics
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8432307/ https://www.ncbi.nlm.nih.gov/pubmed/34589292 http://dx.doi.org/10.7717/peerj.11927
work_keys_str_mv	AT shastriadityaa probabilisticallysampledandspectrallyclusteredplantspeciesusingphenotypiccharacteristics AT ahujakapil probabilisticallysampledandspectrallyclusteredplantspeciesusingphenotypiccharacteristics AT ratnaparkhemilindb probabilisticallysampledandspectrallyclusteredplantspeciesusingphenotypiccharacteristics AT busnelyann probabilisticallysampledandspectrallyclusteredplantspeciesusingphenotypiccharacteristics

Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics

Ejemplares similares