Cargando…
A new bin size index method for statistical analysis of multimodal datasets from materials characterization
This paper presents a normalized standard error-based statistical data binning method, termed “bin size index” (BSI), which yields an optimized, objective bin size for constructing a rational histogram to facilitate subsequent deconvolution of multimodal datasets from materials characterization and...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10322845/ https://www.ncbi.nlm.nih.gov/pubmed/37407657 http://dx.doi.org/10.1038/s41598-023-37969-2 |
_version_ | 1785068846875410432 |
---|---|
author | Jiang, Tao Luo, Shengmin Wang, Dongfang Li, Yucheng Wu, Yongkang He, Li Zhang, Guoping |
author_facet | Jiang, Tao Luo, Shengmin Wang, Dongfang Li, Yucheng Wu, Yongkang He, Li Zhang, Guoping |
author_sort | Jiang, Tao |
collection | PubMed |
description | This paper presents a normalized standard error-based statistical data binning method, termed “bin size index” (BSI), which yields an optimized, objective bin size for constructing a rational histogram to facilitate subsequent deconvolution of multimodal datasets from materials characterization and hence the determination of the underlying probability density functions. Totally ten datasets, including four normally-distributed synthetic ones, three normally-distributed ones on the elasticity of rocks obtained by statistical nanoindentation, and three lognormally-distributed ones on the particle size distributions of flocculated clay suspensions, were used to illustrate the BSI’s concepts and algorithms. While results from the synthetic datasets prove the method’s accuracy and effectiveness, analyses of other real datasets from materials characterization and measurement further demonstrate its rationale, performance, and applicability to practical problems. The BSI method also enables determination of the number of modes via the comparative evaluation of the errors returned from different trial bin sizes. The accuracy and performance of the BSI method are further compared with other widely used binning methods, and the former yields the highest BSI and smallest normalized standard errors. This new method particularly penalizes the overfitting that tends to yield too many pseudo-modes via normalizing the errors by the number of modes hidden in the datasets, and also eliminates the difficulty in specifying criteria for acceptable values of the fitting errors. The advantages and disadvantages of the new method are also discussed. |
format | Online Article Text |
id | pubmed-10322845 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-103228452023-07-07 A new bin size index method for statistical analysis of multimodal datasets from materials characterization Jiang, Tao Luo, Shengmin Wang, Dongfang Li, Yucheng Wu, Yongkang He, Li Zhang, Guoping Sci Rep Article This paper presents a normalized standard error-based statistical data binning method, termed “bin size index” (BSI), which yields an optimized, objective bin size for constructing a rational histogram to facilitate subsequent deconvolution of multimodal datasets from materials characterization and hence the determination of the underlying probability density functions. Totally ten datasets, including four normally-distributed synthetic ones, three normally-distributed ones on the elasticity of rocks obtained by statistical nanoindentation, and three lognormally-distributed ones on the particle size distributions of flocculated clay suspensions, were used to illustrate the BSI’s concepts and algorithms. While results from the synthetic datasets prove the method’s accuracy and effectiveness, analyses of other real datasets from materials characterization and measurement further demonstrate its rationale, performance, and applicability to practical problems. The BSI method also enables determination of the number of modes via the comparative evaluation of the errors returned from different trial bin sizes. The accuracy and performance of the BSI method are further compared with other widely used binning methods, and the former yields the highest BSI and smallest normalized standard errors. This new method particularly penalizes the overfitting that tends to yield too many pseudo-modes via normalizing the errors by the number of modes hidden in the datasets, and also eliminates the difficulty in specifying criteria for acceptable values of the fitting errors. The advantages and disadvantages of the new method are also discussed. Nature Publishing Group UK 2023-07-05 /pmc/articles/PMC10322845/ /pubmed/37407657 http://dx.doi.org/10.1038/s41598-023-37969-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Jiang, Tao Luo, Shengmin Wang, Dongfang Li, Yucheng Wu, Yongkang He, Li Zhang, Guoping A new bin size index method for statistical analysis of multimodal datasets from materials characterization |
title | A new bin size index method for statistical analysis of multimodal datasets from materials characterization |
title_full | A new bin size index method for statistical analysis of multimodal datasets from materials characterization |
title_fullStr | A new bin size index method for statistical analysis of multimodal datasets from materials characterization |
title_full_unstemmed | A new bin size index method for statistical analysis of multimodal datasets from materials characterization |
title_short | A new bin size index method for statistical analysis of multimodal datasets from materials characterization |
title_sort | new bin size index method for statistical analysis of multimodal datasets from materials characterization |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10322845/ https://www.ncbi.nlm.nih.gov/pubmed/37407657 http://dx.doi.org/10.1038/s41598-023-37969-2 |
work_keys_str_mv | AT jiangtao anewbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization AT luoshengmin anewbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization AT wangdongfang anewbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization AT liyucheng anewbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization AT wuyongkang anewbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization AT heli anewbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization AT zhangguoping anewbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization AT jiangtao newbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization AT luoshengmin newbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization AT wangdongfang newbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization AT liyucheng newbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization AT wuyongkang newbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization AT heli newbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization AT zhangguoping newbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization |