Cargando…
Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines
BACKGROUND: The binary similarity and dissimilarity measures have critical roles in the processing of data consisting of binary vectors in various fields including bioinformatics and chemometrics. These metrics express the similarity and dissimilarity values between two binary vectors in terms of th...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5142342/ https://www.ncbi.nlm.nih.gov/pubmed/27927171 http://dx.doi.org/10.1186/s12859-016-1392-z |
_version_ | 1782472754779389952 |
---|---|
author | Wijaya, Sony Hartono Afendi, Farit Mochamad Batubara, Irmanida Darusman, Latifah K. Altaf-Ul-Amin, Md Kanaya, Shigehiko |
author_facet | Wijaya, Sony Hartono Afendi, Farit Mochamad Batubara, Irmanida Darusman, Latifah K. Altaf-Ul-Amin, Md Kanaya, Shigehiko |
author_sort | Wijaya, Sony Hartono |
collection | PubMed |
description | BACKGROUND: The binary similarity and dissimilarity measures have critical roles in the processing of data consisting of binary vectors in various fields including bioinformatics and chemometrics. These metrics express the similarity and dissimilarity values between two binary vectors in terms of the positive matches, absence mismatches or negative matches. To our knowledge, there is no published work presenting a systematic way of finding an appropriate equation to measure binary similarity that performs well for certain data type or application. A proper method to select a suitable binary similarity or dissimilarity measure is needed to obtain better classification results. RESULTS: In this study, we proposed a novel approach to select binary similarity and dissimilarity measures. We collected 79 binary similarity and dissimilarity equations by extensive literature search and implemented those equations as an R package called bmeasures. We applied these metrics to quantify the similarity and dissimilarity between herbal medicine formulas belonging to the Indonesian Jamu and Japanese Kampo separately. We assessed the capability of binary equations to classify herbal medicine pairs into match and mismatch efficacies based on their similarity or dissimilarity coefficients using the Receiver Operating Characteristic (ROC) curve analysis. According to the area under the ROC curve results, we found Indonesian Jamu and Japanese Kampo datasets obtained different ranking of binary similarity and dissimilarity measures. Out of all the equations, the Forbes-2 similarity and the Variant of Correlation similarity measures are recommended for studying the relationship between Jamu formulas and Kampo formulas, respectively. CONCLUSIONS: The selection of binary similarity and dissimilarity measures for multivariate analysis is data dependent. The proposed method can be used to find the most suitable binary similarity and dissimilarity equation wisely for a particular data. Our finding suggests that all four types of matching quantities in the Operational Taxonomic Unit (OTU) table are important to calculate the similarity and dissimilarity coefficients between herbal medicine formulas. Also, the binary similarity and dissimilarity measures that include the negative match quantity d achieve better capability to separate herbal medicine pairs compared to equations that exclude d. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1392-z) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5142342 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51423422016-12-15 Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines Wijaya, Sony Hartono Afendi, Farit Mochamad Batubara, Irmanida Darusman, Latifah K. Altaf-Ul-Amin, Md Kanaya, Shigehiko BMC Bioinformatics Methodology Article BACKGROUND: The binary similarity and dissimilarity measures have critical roles in the processing of data consisting of binary vectors in various fields including bioinformatics and chemometrics. These metrics express the similarity and dissimilarity values between two binary vectors in terms of the positive matches, absence mismatches or negative matches. To our knowledge, there is no published work presenting a systematic way of finding an appropriate equation to measure binary similarity that performs well for certain data type or application. A proper method to select a suitable binary similarity or dissimilarity measure is needed to obtain better classification results. RESULTS: In this study, we proposed a novel approach to select binary similarity and dissimilarity measures. We collected 79 binary similarity and dissimilarity equations by extensive literature search and implemented those equations as an R package called bmeasures. We applied these metrics to quantify the similarity and dissimilarity between herbal medicine formulas belonging to the Indonesian Jamu and Japanese Kampo separately. We assessed the capability of binary equations to classify herbal medicine pairs into match and mismatch efficacies based on their similarity or dissimilarity coefficients using the Receiver Operating Characteristic (ROC) curve analysis. According to the area under the ROC curve results, we found Indonesian Jamu and Japanese Kampo datasets obtained different ranking of binary similarity and dissimilarity measures. Out of all the equations, the Forbes-2 similarity and the Variant of Correlation similarity measures are recommended for studying the relationship between Jamu formulas and Kampo formulas, respectively. CONCLUSIONS: The selection of binary similarity and dissimilarity measures for multivariate analysis is data dependent. The proposed method can be used to find the most suitable binary similarity and dissimilarity equation wisely for a particular data. Our finding suggests that all four types of matching quantities in the Operational Taxonomic Unit (OTU) table are important to calculate the similarity and dissimilarity coefficients between herbal medicine formulas. Also, the binary similarity and dissimilarity measures that include the negative match quantity d achieve better capability to separate herbal medicine pairs compared to equations that exclude d. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1392-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-07 /pmc/articles/PMC5142342/ /pubmed/27927171 http://dx.doi.org/10.1186/s12859-016-1392-z Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Wijaya, Sony Hartono Afendi, Farit Mochamad Batubara, Irmanida Darusman, Latifah K. Altaf-Ul-Amin, Md Kanaya, Shigehiko Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines |
title | Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines |
title_full | Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines |
title_fullStr | Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines |
title_full_unstemmed | Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines |
title_short | Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines |
title_sort | finding an appropriate equation to measure similarity between binary vectors: case studies on indonesian and japanese herbal medicines |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5142342/ https://www.ncbi.nlm.nih.gov/pubmed/27927171 http://dx.doi.org/10.1186/s12859-016-1392-z |
work_keys_str_mv | AT wijayasonyhartono findinganappropriateequationtomeasuresimilaritybetweenbinaryvectorscasestudiesonindonesianandjapaneseherbalmedicines AT afendifaritmochamad findinganappropriateequationtomeasuresimilaritybetweenbinaryvectorscasestudiesonindonesianandjapaneseherbalmedicines AT batubarairmanida findinganappropriateequationtomeasuresimilaritybetweenbinaryvectorscasestudiesonindonesianandjapaneseherbalmedicines AT darusmanlatifahk findinganappropriateequationtomeasuresimilaritybetweenbinaryvectorscasestudiesonindonesianandjapaneseherbalmedicines AT altafulaminmd findinganappropriateequationtomeasuresimilaritybetweenbinaryvectorscasestudiesonindonesianandjapaneseherbalmedicines AT kanayashigehiko findinganappropriateequationtomeasuresimilaritybetweenbinaryvectorscasestudiesonindonesianandjapaneseherbalmedicines |