Cargando…

A new scoring system in Cystic Fibrosis: statistical tools for database analysis – a preliminary report

BACKGROUND: Cystic fibrosis is the most common fatal genetic disorder in the Caucasian population. Scoring systems for assessment of Cystic fibrosis disease severity have been used for almost 50 years, without being adapted to the milder phenotype of the disease in the 21(st )century. The aim of thi...

Descripción completa

Detalles Bibliográficos
Autores principales: Hafen, GM, Hurst, C, Yearwood, J, Smith, J, Dzalilov, Z, Robinson, PJ
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2580762/
https://www.ncbi.nlm.nih.gov/pubmed/18834547
http://dx.doi.org/10.1186/1472-6947-8-44
_version_ 1782160610820096000
author Hafen, GM
Hurst, C
Yearwood, J
Smith, J
Dzalilov, Z
Robinson, PJ
author_facet Hafen, GM
Hurst, C
Yearwood, J
Smith, J
Dzalilov, Z
Robinson, PJ
author_sort Hafen, GM
collection PubMed
description BACKGROUND: Cystic fibrosis is the most common fatal genetic disorder in the Caucasian population. Scoring systems for assessment of Cystic fibrosis disease severity have been used for almost 50 years, without being adapted to the milder phenotype of the disease in the 21(st )century. The aim of this current project is to develop a new scoring system using a database and employing various statistical tools. This study protocol reports the development of the statistical tools in order to create such a scoring system. METHODS: The evaluation is based on the Cystic Fibrosis database from the cohort at the Royal Children's Hospital in Melbourne. Initially, unsupervised clustering of the all data records was performed using a range of clustering algorithms. In particular incremental clustering algorithms were used. The clusters obtained were characterised using rules from decision trees and the results examined by clinicians. In order to obtain a clearer definition of classes expert opinion of each individual's clinical severity was sought. After data preparation including expert-opinion of an individual's clinical severity on a 3 point-scale (mild, moderate and severe disease), two multivariate techniques were used throughout the analysis to establish a method that would have a better success in feature selection and model derivation: 'Canonical Analysis of Principal Coordinates' and 'Linear Discriminant Analysis'. A 3-step procedure was performed with (1) selection of features, (2) extracting 5 severity classes out of a 3 severity class as defined per expert-opinion and (3) establishment of calibration datasets. RESULTS: (1) Feature selection: CAP has a more effective "modelling" focus than DA. (2) Extraction of 5 severity classes: after variables were identified as important in discriminating contiguous CF severity groups on the 3-point scale as mild/moderate and moderate/severe, Discriminant Function (DF) was used to determine the new groups mild, intermediate moderate, moderate, intermediate severe and severe disease. (3) Generated confusion tables showed a misclassification rate of 19.1% for males and 16.5% for females, with a majority of misallocations into adjacent severity classes particularly for males. CONCLUSION: Our preliminary data show that using CAP for detection of selection features and Linear DA to derive the actual model in a CF database might be helpful in developing a scoring system. However, there are several limitations, particularly more data entry points are needed to finalize a score and the statistical tools have further to be refined and validated, with re-running the statistical methods in the larger dataset.
format Text
id pubmed-2580762
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25807622008-11-07 A new scoring system in Cystic Fibrosis: statistical tools for database analysis – a preliminary report Hafen, GM Hurst, C Yearwood, J Smith, J Dzalilov, Z Robinson, PJ BMC Med Inform Decis Mak Research Article BACKGROUND: Cystic fibrosis is the most common fatal genetic disorder in the Caucasian population. Scoring systems for assessment of Cystic fibrosis disease severity have been used for almost 50 years, without being adapted to the milder phenotype of the disease in the 21(st )century. The aim of this current project is to develop a new scoring system using a database and employing various statistical tools. This study protocol reports the development of the statistical tools in order to create such a scoring system. METHODS: The evaluation is based on the Cystic Fibrosis database from the cohort at the Royal Children's Hospital in Melbourne. Initially, unsupervised clustering of the all data records was performed using a range of clustering algorithms. In particular incremental clustering algorithms were used. The clusters obtained were characterised using rules from decision trees and the results examined by clinicians. In order to obtain a clearer definition of classes expert opinion of each individual's clinical severity was sought. After data preparation including expert-opinion of an individual's clinical severity on a 3 point-scale (mild, moderate and severe disease), two multivariate techniques were used throughout the analysis to establish a method that would have a better success in feature selection and model derivation: 'Canonical Analysis of Principal Coordinates' and 'Linear Discriminant Analysis'. A 3-step procedure was performed with (1) selection of features, (2) extracting 5 severity classes out of a 3 severity class as defined per expert-opinion and (3) establishment of calibration datasets. RESULTS: (1) Feature selection: CAP has a more effective "modelling" focus than DA. (2) Extraction of 5 severity classes: after variables were identified as important in discriminating contiguous CF severity groups on the 3-point scale as mild/moderate and moderate/severe, Discriminant Function (DF) was used to determine the new groups mild, intermediate moderate, moderate, intermediate severe and severe disease. (3) Generated confusion tables showed a misclassification rate of 19.1% for males and 16.5% for females, with a majority of misallocations into adjacent severity classes particularly for males. CONCLUSION: Our preliminary data show that using CAP for detection of selection features and Linear DA to derive the actual model in a CF database might be helpful in developing a scoring system. However, there are several limitations, particularly more data entry points are needed to finalize a score and the statistical tools have further to be refined and validated, with re-running the statistical methods in the larger dataset. BioMed Central 2008-10-05 /pmc/articles/PMC2580762/ /pubmed/18834547 http://dx.doi.org/10.1186/1472-6947-8-44 Text en Copyright © 2008 Hafen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Hafen, GM
Hurst, C
Yearwood, J
Smith, J
Dzalilov, Z
Robinson, PJ
A new scoring system in Cystic Fibrosis: statistical tools for database analysis – a preliminary report
title A new scoring system in Cystic Fibrosis: statistical tools for database analysis – a preliminary report
title_full A new scoring system in Cystic Fibrosis: statistical tools for database analysis – a preliminary report
title_fullStr A new scoring system in Cystic Fibrosis: statistical tools for database analysis – a preliminary report
title_full_unstemmed A new scoring system in Cystic Fibrosis: statistical tools for database analysis – a preliminary report
title_short A new scoring system in Cystic Fibrosis: statistical tools for database analysis – a preliminary report
title_sort new scoring system in cystic fibrosis: statistical tools for database analysis – a preliminary report
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2580762/
https://www.ncbi.nlm.nih.gov/pubmed/18834547
http://dx.doi.org/10.1186/1472-6947-8-44
work_keys_str_mv AT hafengm anewscoringsystemincysticfibrosisstatisticaltoolsfordatabaseanalysisapreliminaryreport
AT hurstc anewscoringsystemincysticfibrosisstatisticaltoolsfordatabaseanalysisapreliminaryreport
AT yearwoodj anewscoringsystemincysticfibrosisstatisticaltoolsfordatabaseanalysisapreliminaryreport
AT smithj anewscoringsystemincysticfibrosisstatisticaltoolsfordatabaseanalysisapreliminaryreport
AT dzalilovz anewscoringsystemincysticfibrosisstatisticaltoolsfordatabaseanalysisapreliminaryreport
AT robinsonpj anewscoringsystemincysticfibrosisstatisticaltoolsfordatabaseanalysisapreliminaryreport
AT hafengm newscoringsystemincysticfibrosisstatisticaltoolsfordatabaseanalysisapreliminaryreport
AT hurstc newscoringsystemincysticfibrosisstatisticaltoolsfordatabaseanalysisapreliminaryreport
AT yearwoodj newscoringsystemincysticfibrosisstatisticaltoolsfordatabaseanalysisapreliminaryreport
AT smithj newscoringsystemincysticfibrosisstatisticaltoolsfordatabaseanalysisapreliminaryreport
AT dzalilovz newscoringsystemincysticfibrosisstatisticaltoolsfordatabaseanalysisapreliminaryreport
AT robinsonpj newscoringsystemincysticfibrosisstatisticaltoolsfordatabaseanalysisapreliminaryreport