Cargando…

Vector Quantized Spectral Clustering Applied to Whole Genome Sequences of Plants

We develop a Vector Quantized Spectral Clustering (VQSC) algorithm that is a combination of spectral clustering (SC) and vector quantization (VQ) sampling for grouping genome sequences of plants. The inspiration here is to use SC for its accuracy and VQ to make the algorithm computationally cheap (t...

Descripción completa

Detalles Bibliográficos
Autores principales: Shastri, Aditya A, Ahuja, Kapil, Ratnaparkhe, Milind B, Shah, Aditya, Gagrani, Aishwary, Lal, Anant
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6435876/
https://www.ncbi.nlm.nih.gov/pubmed/30936678
http://dx.doi.org/10.1177/1176934319836997
_version_ 1783406722308636672
author Shastri, Aditya A
Ahuja, Kapil
Ratnaparkhe, Milind B
Shah, Aditya
Gagrani, Aishwary
Lal, Anant
author_facet Shastri, Aditya A
Ahuja, Kapil
Ratnaparkhe, Milind B
Shah, Aditya
Gagrani, Aishwary
Lal, Anant
author_sort Shastri, Aditya A
collection PubMed
description We develop a Vector Quantized Spectral Clustering (VQSC) algorithm that is a combination of spectral clustering (SC) and vector quantization (VQ) sampling for grouping genome sequences of plants. The inspiration here is to use SC for its accuracy and VQ to make the algorithm computationally cheap (the complexity of SC is cubic in terms of the input size). Although the combination of SC and VQ is not new, the novelty of our work is in developing the crucial similarity matrix in SC as well as use of k-medoids in VQ, both adapted for the plant genome data. For Soybean, we compare our approach with commonly used techniques like Un-weighted Pair Graph Method with Arithmetic mean (UPGMA) and Neighbor Joining (NJ). Experimental results show that our VQSC outperforms both these techniques significantly in terms of cluster quality (average improvement of 21% over UPGMA and 24% over NJ) as well as time complexity (order of magnitude faster than both UPGMA and NJ).
format Online
Article
Text
id pubmed-6435876
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-64358762019-04-01 Vector Quantized Spectral Clustering Applied to Whole Genome Sequences of Plants Shastri, Aditya A Ahuja, Kapil Ratnaparkhe, Milind B Shah, Aditya Gagrani, Aishwary Lal, Anant Evol Bioinform Online Rapid Communication We develop a Vector Quantized Spectral Clustering (VQSC) algorithm that is a combination of spectral clustering (SC) and vector quantization (VQ) sampling for grouping genome sequences of plants. The inspiration here is to use SC for its accuracy and VQ to make the algorithm computationally cheap (the complexity of SC is cubic in terms of the input size). Although the combination of SC and VQ is not new, the novelty of our work is in developing the crucial similarity matrix in SC as well as use of k-medoids in VQ, both adapted for the plant genome data. For Soybean, we compare our approach with commonly used techniques like Un-weighted Pair Graph Method with Arithmetic mean (UPGMA) and Neighbor Joining (NJ). Experimental results show that our VQSC outperforms both these techniques significantly in terms of cluster quality (average improvement of 21% over UPGMA and 24% over NJ) as well as time complexity (order of magnitude faster than both UPGMA and NJ). SAGE Publications 2019-03-26 /pmc/articles/PMC6435876/ /pubmed/30936678 http://dx.doi.org/10.1177/1176934319836997 Text en © The Author(s) 2019 http://www.creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Rapid Communication
Shastri, Aditya A
Ahuja, Kapil
Ratnaparkhe, Milind B
Shah, Aditya
Gagrani, Aishwary
Lal, Anant
Vector Quantized Spectral Clustering Applied to Whole Genome Sequences of Plants
title Vector Quantized Spectral Clustering Applied to Whole Genome Sequences of Plants
title_full Vector Quantized Spectral Clustering Applied to Whole Genome Sequences of Plants
title_fullStr Vector Quantized Spectral Clustering Applied to Whole Genome Sequences of Plants
title_full_unstemmed Vector Quantized Spectral Clustering Applied to Whole Genome Sequences of Plants
title_short Vector Quantized Spectral Clustering Applied to Whole Genome Sequences of Plants
title_sort vector quantized spectral clustering applied to whole genome sequences of plants
topic Rapid Communication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6435876/
https://www.ncbi.nlm.nih.gov/pubmed/30936678
http://dx.doi.org/10.1177/1176934319836997
work_keys_str_mv AT shastriadityaa vectorquantizedspectralclusteringappliedtowholegenomesequencesofplants
AT ahujakapil vectorquantizedspectralclusteringappliedtowholegenomesequencesofplants
AT ratnaparkhemilindb vectorquantizedspectralclusteringappliedtowholegenomesequencesofplants
AT shahaditya vectorquantizedspectralclusteringappliedtowholegenomesequencesofplants
AT gagraniaishwary vectorquantizedspectralclusteringappliedtowholegenomesequencesofplants
AT lalanant vectorquantizedspectralclusteringappliedtowholegenomesequencesofplants