Cargando…
Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults
BACKGROUND: Despite the extensive research on data mining algorithms, there is still a lack of a standard protocol to evaluate the performance of the existing algorithms. Therefore, the study aims to provide a novel procedure that combines data mining algorithms and simplified preprocessing to estab...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10152698/ https://www.ncbi.nlm.nih.gov/pubmed/37131135 http://dx.doi.org/10.1186/s12874-023-01898-5 |
_version_ | 1785035791887499264 |
---|---|
author | Zhong, Jian Ma, Chaochao Hou, Li’an Yin, Yicong Zhao, Fang Hu, Yingying Song, Ailing Wang, Danchen Li, Lei Cheng, Xinqi Qiu, Ling |
author_facet | Zhong, Jian Ma, Chaochao Hou, Li’an Yin, Yicong Zhao, Fang Hu, Yingying Song, Ailing Wang, Danchen Li, Lei Cheng, Xinqi Qiu, Ling |
author_sort | Zhong, Jian |
collection | PubMed |
description | BACKGROUND: Despite the extensive research on data mining algorithms, there is still a lack of a standard protocol to evaluate the performance of the existing algorithms. Therefore, the study aims to provide a novel procedure that combines data mining algorithms and simplified preprocessing to establish reference intervals (RIs), with the performance of five algorithms assessed objectively as well. METHODS: Two data sets were derived from the population undergoing a physical examination. Hoffmann, Bhattacharya, Expectation Maximum (EM), kosmic, and refineR algorithms combined with two-step data preprocessing respectively were implemented in the Test data set to establish RIs for thyroid-related hormones. Algorithm-calculated RIs were compared with the standard RIs calculated from the Reference data set in which reference individuals were selected following strict inclusion and exclusion criteria. Objective assessment of the methods is implemented by the bias ratio (BR) matrix. RESULTS: RIs of thyroid-related hormones are established. There is a high consistency between TSH RIs established by the EM algorithm and the standard TSH RIs (BR = 0.063), although EM algorithms seems to perform poor on other hormones. RIs calculated by Hoffmann, Bhattacharya, and refineR methods for free and total triiodo-thyronine, free and total thyroxine respectively are close and match the standard RIs. CONCLUSION: An effective approach for objectively evaluating the performance of the algorithm based on the BR matrix is established. EM algorithm combined with simplified preprocessing can handle data with significant skewness, but its performance is limited in other scenarios. The other four algorithms perform well for data with Gaussian or near-Gaussian distribution. Using the appropriate algorithm based on the data distribution characteristics is recommended. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-023-01898-5. |
format | Online Article Text |
id | pubmed-10152698 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-101526982023-05-03 Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults Zhong, Jian Ma, Chaochao Hou, Li’an Yin, Yicong Zhao, Fang Hu, Yingying Song, Ailing Wang, Danchen Li, Lei Cheng, Xinqi Qiu, Ling BMC Med Res Methodol Research BACKGROUND: Despite the extensive research on data mining algorithms, there is still a lack of a standard protocol to evaluate the performance of the existing algorithms. Therefore, the study aims to provide a novel procedure that combines data mining algorithms and simplified preprocessing to establish reference intervals (RIs), with the performance of five algorithms assessed objectively as well. METHODS: Two data sets were derived from the population undergoing a physical examination. Hoffmann, Bhattacharya, Expectation Maximum (EM), kosmic, and refineR algorithms combined with two-step data preprocessing respectively were implemented in the Test data set to establish RIs for thyroid-related hormones. Algorithm-calculated RIs were compared with the standard RIs calculated from the Reference data set in which reference individuals were selected following strict inclusion and exclusion criteria. Objective assessment of the methods is implemented by the bias ratio (BR) matrix. RESULTS: RIs of thyroid-related hormones are established. There is a high consistency between TSH RIs established by the EM algorithm and the standard TSH RIs (BR = 0.063), although EM algorithms seems to perform poor on other hormones. RIs calculated by Hoffmann, Bhattacharya, and refineR methods for free and total triiodo-thyronine, free and total thyroxine respectively are close and match the standard RIs. CONCLUSION: An effective approach for objectively evaluating the performance of the algorithm based on the BR matrix is established. EM algorithm combined with simplified preprocessing can handle data with significant skewness, but its performance is limited in other scenarios. The other four algorithms perform well for data with Gaussian or near-Gaussian distribution. Using the appropriate algorithm based on the data distribution characteristics is recommended. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-023-01898-5. BioMed Central 2023-05-02 /pmc/articles/PMC10152698/ /pubmed/37131135 http://dx.doi.org/10.1186/s12874-023-01898-5 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Zhong, Jian Ma, Chaochao Hou, Li’an Yin, Yicong Zhao, Fang Hu, Yingying Song, Ailing Wang, Danchen Li, Lei Cheng, Xinqi Qiu, Ling Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults |
title | Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults |
title_full | Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults |
title_fullStr | Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults |
title_full_unstemmed | Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults |
title_short | Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults |
title_sort | utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10152698/ https://www.ncbi.nlm.nih.gov/pubmed/37131135 http://dx.doi.org/10.1186/s12874-023-01898-5 |
work_keys_str_mv | AT zhongjian utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults AT machaochao utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults AT houlian utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults AT yinyicong utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults AT zhaofang utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults AT huyingying utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults AT songailing utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults AT wangdanchen utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults AT lilei utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults AT chengxinqi utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults AT qiuling utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults |