Cargando…

A new bin size index method for statistical analysis of multimodal datasets from materials characterization

This paper presents a normalized standard error-based statistical data binning method, termed “bin size index” (BSI), which yields an optimized, objective bin size for constructing a rational histogram to facilitate subsequent deconvolution of multimodal datasets from materials characterization and...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Tao, Luo, Shengmin, Wang, Dongfang, Li, Yucheng, Wu, Yongkang, He, Li, Zhang, Guoping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10322845/
https://www.ncbi.nlm.nih.gov/pubmed/37407657
http://dx.doi.org/10.1038/s41598-023-37969-2
_version_ 1785068846875410432
author Jiang, Tao
Luo, Shengmin
Wang, Dongfang
Li, Yucheng
Wu, Yongkang
He, Li
Zhang, Guoping
author_facet Jiang, Tao
Luo, Shengmin
Wang, Dongfang
Li, Yucheng
Wu, Yongkang
He, Li
Zhang, Guoping
author_sort Jiang, Tao
collection PubMed
description This paper presents a normalized standard error-based statistical data binning method, termed “bin size index” (BSI), which yields an optimized, objective bin size for constructing a rational histogram to facilitate subsequent deconvolution of multimodal datasets from materials characterization and hence the determination of the underlying probability density functions. Totally ten datasets, including four normally-distributed synthetic ones, three normally-distributed ones on the elasticity of rocks obtained by statistical nanoindentation, and three lognormally-distributed ones on the particle size distributions of flocculated clay suspensions, were used to illustrate the BSI’s concepts and algorithms. While results from the synthetic datasets prove the method’s accuracy and effectiveness, analyses of other real datasets from materials characterization and measurement further demonstrate its rationale, performance, and applicability to practical problems. The BSI method also enables determination of the number of modes via the comparative evaluation of the errors returned from different trial bin sizes. The accuracy and performance of the BSI method are further compared with other widely used binning methods, and the former yields the highest BSI and smallest normalized standard errors. This new method particularly penalizes the overfitting that tends to yield too many pseudo-modes via normalizing the errors by the number of modes hidden in the datasets, and also eliminates the difficulty in specifying criteria for acceptable values of the fitting errors. The advantages and disadvantages of the new method are also discussed.
format Online
Article
Text
id pubmed-10322845
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-103228452023-07-07 A new bin size index method for statistical analysis of multimodal datasets from materials characterization Jiang, Tao Luo, Shengmin Wang, Dongfang Li, Yucheng Wu, Yongkang He, Li Zhang, Guoping Sci Rep Article This paper presents a normalized standard error-based statistical data binning method, termed “bin size index” (BSI), which yields an optimized, objective bin size for constructing a rational histogram to facilitate subsequent deconvolution of multimodal datasets from materials characterization and hence the determination of the underlying probability density functions. Totally ten datasets, including four normally-distributed synthetic ones, three normally-distributed ones on the elasticity of rocks obtained by statistical nanoindentation, and three lognormally-distributed ones on the particle size distributions of flocculated clay suspensions, were used to illustrate the BSI’s concepts and algorithms. While results from the synthetic datasets prove the method’s accuracy and effectiveness, analyses of other real datasets from materials characterization and measurement further demonstrate its rationale, performance, and applicability to practical problems. The BSI method also enables determination of the number of modes via the comparative evaluation of the errors returned from different trial bin sizes. The accuracy and performance of the BSI method are further compared with other widely used binning methods, and the former yields the highest BSI and smallest normalized standard errors. This new method particularly penalizes the overfitting that tends to yield too many pseudo-modes via normalizing the errors by the number of modes hidden in the datasets, and also eliminates the difficulty in specifying criteria for acceptable values of the fitting errors. The advantages and disadvantages of the new method are also discussed. Nature Publishing Group UK 2023-07-05 /pmc/articles/PMC10322845/ /pubmed/37407657 http://dx.doi.org/10.1038/s41598-023-37969-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Jiang, Tao
Luo, Shengmin
Wang, Dongfang
Li, Yucheng
Wu, Yongkang
He, Li
Zhang, Guoping
A new bin size index method for statistical analysis of multimodal datasets from materials characterization
title A new bin size index method for statistical analysis of multimodal datasets from materials characterization
title_full A new bin size index method for statistical analysis of multimodal datasets from materials characterization
title_fullStr A new bin size index method for statistical analysis of multimodal datasets from materials characterization
title_full_unstemmed A new bin size index method for statistical analysis of multimodal datasets from materials characterization
title_short A new bin size index method for statistical analysis of multimodal datasets from materials characterization
title_sort new bin size index method for statistical analysis of multimodal datasets from materials characterization
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10322845/
https://www.ncbi.nlm.nih.gov/pubmed/37407657
http://dx.doi.org/10.1038/s41598-023-37969-2
work_keys_str_mv AT jiangtao anewbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization
AT luoshengmin anewbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization
AT wangdongfang anewbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization
AT liyucheng anewbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization
AT wuyongkang anewbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization
AT heli anewbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization
AT zhangguoping anewbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization
AT jiangtao newbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization
AT luoshengmin newbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization
AT wangdongfang newbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization
AT liyucheng newbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization
AT wuyongkang newbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization
AT heli newbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization
AT zhangguoping newbinsizeindexmethodforstatisticalanalysisofmultimodaldatasetsfrommaterialscharacterization