Cargando…

Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines

BACKGROUND: The binary similarity and dissimilarity measures have critical roles in the processing of data consisting of binary vectors in various fields including bioinformatics and chemometrics. These metrics express the similarity and dissimilarity values between two binary vectors in terms of th...

Descripción completa

Detalles Bibliográficos
Autores principales: Wijaya, Sony Hartono, Afendi, Farit Mochamad, Batubara, Irmanida, Darusman, Latifah K., Altaf-Ul-Amin, Md, Kanaya, Shigehiko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5142342/
https://www.ncbi.nlm.nih.gov/pubmed/27927171
http://dx.doi.org/10.1186/s12859-016-1392-z
_version_ 1782472754779389952
author Wijaya, Sony Hartono
Afendi, Farit Mochamad
Batubara, Irmanida
Darusman, Latifah K.
Altaf-Ul-Amin, Md
Kanaya, Shigehiko
author_facet Wijaya, Sony Hartono
Afendi, Farit Mochamad
Batubara, Irmanida
Darusman, Latifah K.
Altaf-Ul-Amin, Md
Kanaya, Shigehiko
author_sort Wijaya, Sony Hartono
collection PubMed
description BACKGROUND: The binary similarity and dissimilarity measures have critical roles in the processing of data consisting of binary vectors in various fields including bioinformatics and chemometrics. These metrics express the similarity and dissimilarity values between two binary vectors in terms of the positive matches, absence mismatches or negative matches. To our knowledge, there is no published work presenting a systematic way of finding an appropriate equation to measure binary similarity that performs well for certain data type or application. A proper method to select a suitable binary similarity or dissimilarity measure is needed to obtain better classification results. RESULTS: In this study, we proposed a novel approach to select binary similarity and dissimilarity measures. We collected 79 binary similarity and dissimilarity equations by extensive literature search and implemented those equations as an R package called bmeasures. We applied these metrics to quantify the similarity and dissimilarity between herbal medicine formulas belonging to the Indonesian Jamu and Japanese Kampo separately. We assessed the capability of binary equations to classify herbal medicine pairs into match and mismatch efficacies based on their similarity or dissimilarity coefficients using the Receiver Operating Characteristic (ROC) curve analysis. According to the area under the ROC curve results, we found Indonesian Jamu and Japanese Kampo datasets obtained different ranking of binary similarity and dissimilarity measures. Out of all the equations, the Forbes-2 similarity and the Variant of Correlation similarity measures are recommended for studying the relationship between Jamu formulas and Kampo formulas, respectively. CONCLUSIONS: The selection of binary similarity and dissimilarity measures for multivariate analysis is data dependent. The proposed method can be used to find the most suitable binary similarity and dissimilarity equation wisely for a particular data. Our finding suggests that all four types of matching quantities in the Operational Taxonomic Unit (OTU) table are important to calculate the similarity and dissimilarity coefficients between herbal medicine formulas. Also, the binary similarity and dissimilarity measures that include the negative match quantity d achieve better capability to separate herbal medicine pairs compared to equations that exclude d. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1392-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5142342
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51423422016-12-15 Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines Wijaya, Sony Hartono Afendi, Farit Mochamad Batubara, Irmanida Darusman, Latifah K. Altaf-Ul-Amin, Md Kanaya, Shigehiko BMC Bioinformatics Methodology Article BACKGROUND: The binary similarity and dissimilarity measures have critical roles in the processing of data consisting of binary vectors in various fields including bioinformatics and chemometrics. These metrics express the similarity and dissimilarity values between two binary vectors in terms of the positive matches, absence mismatches or negative matches. To our knowledge, there is no published work presenting a systematic way of finding an appropriate equation to measure binary similarity that performs well for certain data type or application. A proper method to select a suitable binary similarity or dissimilarity measure is needed to obtain better classification results. RESULTS: In this study, we proposed a novel approach to select binary similarity and dissimilarity measures. We collected 79 binary similarity and dissimilarity equations by extensive literature search and implemented those equations as an R package called bmeasures. We applied these metrics to quantify the similarity and dissimilarity between herbal medicine formulas belonging to the Indonesian Jamu and Japanese Kampo separately. We assessed the capability of binary equations to classify herbal medicine pairs into match and mismatch efficacies based on their similarity or dissimilarity coefficients using the Receiver Operating Characteristic (ROC) curve analysis. According to the area under the ROC curve results, we found Indonesian Jamu and Japanese Kampo datasets obtained different ranking of binary similarity and dissimilarity measures. Out of all the equations, the Forbes-2 similarity and the Variant of Correlation similarity measures are recommended for studying the relationship between Jamu formulas and Kampo formulas, respectively. CONCLUSIONS: The selection of binary similarity and dissimilarity measures for multivariate analysis is data dependent. The proposed method can be used to find the most suitable binary similarity and dissimilarity equation wisely for a particular data. Our finding suggests that all four types of matching quantities in the Operational Taxonomic Unit (OTU) table are important to calculate the similarity and dissimilarity coefficients between herbal medicine formulas. Also, the binary similarity and dissimilarity measures that include the negative match quantity d achieve better capability to separate herbal medicine pairs compared to equations that exclude d. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1392-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-07 /pmc/articles/PMC5142342/ /pubmed/27927171 http://dx.doi.org/10.1186/s12859-016-1392-z Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Wijaya, Sony Hartono
Afendi, Farit Mochamad
Batubara, Irmanida
Darusman, Latifah K.
Altaf-Ul-Amin, Md
Kanaya, Shigehiko
Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines
title Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines
title_full Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines
title_fullStr Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines
title_full_unstemmed Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines
title_short Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines
title_sort finding an appropriate equation to measure similarity between binary vectors: case studies on indonesian and japanese herbal medicines
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5142342/
https://www.ncbi.nlm.nih.gov/pubmed/27927171
http://dx.doi.org/10.1186/s12859-016-1392-z
work_keys_str_mv AT wijayasonyhartono findinganappropriateequationtomeasuresimilaritybetweenbinaryvectorscasestudiesonindonesianandjapaneseherbalmedicines
AT afendifaritmochamad findinganappropriateequationtomeasuresimilaritybetweenbinaryvectorscasestudiesonindonesianandjapaneseherbalmedicines
AT batubarairmanida findinganappropriateequationtomeasuresimilaritybetweenbinaryvectorscasestudiesonindonesianandjapaneseherbalmedicines
AT darusmanlatifahk findinganappropriateequationtomeasuresimilaritybetweenbinaryvectorscasestudiesonindonesianandjapaneseherbalmedicines
AT altafulaminmd findinganappropriateequationtomeasuresimilaritybetweenbinaryvectorscasestudiesonindonesianandjapaneseherbalmedicines
AT kanayashigehiko findinganappropriateequationtomeasuresimilaritybetweenbinaryvectorscasestudiesonindonesianandjapaneseherbalmedicines