Cargando…

Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults

BACKGROUND: Despite the extensive research on data mining algorithms, there is still a lack of a standard protocol to evaluate the performance of the existing algorithms. Therefore, the study aims to provide a novel procedure that combines data mining algorithms and simplified preprocessing to estab...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhong, Jian, Ma, Chaochao, Hou, Li’an, Yin, Yicong, Zhao, Fang, Hu, Yingying, Song, Ailing, Wang, Danchen, Li, Lei, Cheng, Xinqi, Qiu, Ling
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10152698/
https://www.ncbi.nlm.nih.gov/pubmed/37131135
http://dx.doi.org/10.1186/s12874-023-01898-5
_version_ 1785035791887499264
author Zhong, Jian
Ma, Chaochao
Hou, Li’an
Yin, Yicong
Zhao, Fang
Hu, Yingying
Song, Ailing
Wang, Danchen
Li, Lei
Cheng, Xinqi
Qiu, Ling
author_facet Zhong, Jian
Ma, Chaochao
Hou, Li’an
Yin, Yicong
Zhao, Fang
Hu, Yingying
Song, Ailing
Wang, Danchen
Li, Lei
Cheng, Xinqi
Qiu, Ling
author_sort Zhong, Jian
collection PubMed
description BACKGROUND: Despite the extensive research on data mining algorithms, there is still a lack of a standard protocol to evaluate the performance of the existing algorithms. Therefore, the study aims to provide a novel procedure that combines data mining algorithms and simplified preprocessing to establish reference intervals (RIs), with the performance of five algorithms assessed objectively as well. METHODS: Two data sets were derived from the population undergoing a physical examination. Hoffmann, Bhattacharya, Expectation Maximum (EM), kosmic, and refineR algorithms combined with two-step data preprocessing respectively were implemented in the Test data set to establish RIs for thyroid-related hormones. Algorithm-calculated RIs were compared with the standard RIs calculated from the Reference data set in which reference individuals were selected following strict inclusion and exclusion criteria. Objective assessment of the methods is implemented by the bias ratio (BR) matrix. RESULTS: RIs of thyroid-related hormones are established. There is a high consistency between TSH RIs established by the EM algorithm and the standard TSH RIs (BR = 0.063), although EM algorithms seems to perform poor on other hormones. RIs calculated by Hoffmann, Bhattacharya, and refineR methods for free and total triiodo-thyronine, free and total thyroxine respectively are close and match the standard RIs. CONCLUSION: An effective approach for objectively evaluating the performance of the algorithm based on the BR matrix is established. EM algorithm combined with simplified preprocessing can handle data with significant skewness, but its performance is limited in other scenarios. The other four algorithms perform well for data with Gaussian or near-Gaussian distribution. Using the appropriate algorithm based on the data distribution characteristics is recommended. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-023-01898-5.
format Online
Article
Text
id pubmed-10152698
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-101526982023-05-03 Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults Zhong, Jian Ma, Chaochao Hou, Li’an Yin, Yicong Zhao, Fang Hu, Yingying Song, Ailing Wang, Danchen Li, Lei Cheng, Xinqi Qiu, Ling BMC Med Res Methodol Research BACKGROUND: Despite the extensive research on data mining algorithms, there is still a lack of a standard protocol to evaluate the performance of the existing algorithms. Therefore, the study aims to provide a novel procedure that combines data mining algorithms and simplified preprocessing to establish reference intervals (RIs), with the performance of five algorithms assessed objectively as well. METHODS: Two data sets were derived from the population undergoing a physical examination. Hoffmann, Bhattacharya, Expectation Maximum (EM), kosmic, and refineR algorithms combined with two-step data preprocessing respectively were implemented in the Test data set to establish RIs for thyroid-related hormones. Algorithm-calculated RIs were compared with the standard RIs calculated from the Reference data set in which reference individuals were selected following strict inclusion and exclusion criteria. Objective assessment of the methods is implemented by the bias ratio (BR) matrix. RESULTS: RIs of thyroid-related hormones are established. There is a high consistency between TSH RIs established by the EM algorithm and the standard TSH RIs (BR = 0.063), although EM algorithms seems to perform poor on other hormones. RIs calculated by Hoffmann, Bhattacharya, and refineR methods for free and total triiodo-thyronine, free and total thyroxine respectively are close and match the standard RIs. CONCLUSION: An effective approach for objectively evaluating the performance of the algorithm based on the BR matrix is established. EM algorithm combined with simplified preprocessing can handle data with significant skewness, but its performance is limited in other scenarios. The other four algorithms perform well for data with Gaussian or near-Gaussian distribution. Using the appropriate algorithm based on the data distribution characteristics is recommended. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-023-01898-5. BioMed Central 2023-05-02 /pmc/articles/PMC10152698/ /pubmed/37131135 http://dx.doi.org/10.1186/s12874-023-01898-5 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Zhong, Jian
Ma, Chaochao
Hou, Li’an
Yin, Yicong
Zhao, Fang
Hu, Yingying
Song, Ailing
Wang, Danchen
Li, Lei
Cheng, Xinqi
Qiu, Ling
Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults
title Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults
title_full Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults
title_fullStr Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults
title_full_unstemmed Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults
title_short Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults
title_sort utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10152698/
https://www.ncbi.nlm.nih.gov/pubmed/37131135
http://dx.doi.org/10.1186/s12874-023-01898-5
work_keys_str_mv AT zhongjian utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults
AT machaochao utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults
AT houlian utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults
AT yinyicong utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults
AT zhaofang utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults
AT huyingying utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults
AT songailing utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults
AT wangdanchen utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults
AT lilei utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults
AT chengxinqi utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults
AT qiuling utilizationoffivedataminingalgorithmscombinedwithsimplifiedpreprocessingtoestablishreferenceintervalsofthyroidrelatedhormonesfornonelderlyadults