Cargando…
Harmonizing Healthy Cohorts to Support Multicenter Studies on Migraine Classification using Brain MRI Data
Multicenter and multi-scanner imaging studies might be needed to provide sample sizes large enough for developing accurate predictive models. However, multicenter studies, which likely include confounding factors due to subtle differences in research participant characteristics, MRI scanners, and im...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327280/ https://www.ncbi.nlm.nih.gov/pubmed/37425905 http://dx.doi.org/10.1101/2023.06.26.23291909 |
Sumario: | Multicenter and multi-scanner imaging studies might be needed to provide sample sizes large enough for developing accurate predictive models. However, multicenter studies, which likely include confounding factors due to subtle differences in research participant characteristics, MRI scanners, and imaging acquisition protocols, might not yield generalizable machine learning models, that is, models developed using one dataset may not be applicable to a different dataset. The generalizability of classification models is key for multi-scanner and multicenter studies, and for providing reproducible results. This study developed a data harmonization strategy to identify healthy controls with similar (homogenous) characteristics from multicenter studies to validate the generalization of machine-learning techniques for classifying individual migraine patients and healthy controls using brain MRI data. The Maximum Mean Discrepancy (MMD) was used to compare the two datasets represented in Geodesic Flow Kernel (GFK) space, capturing the data variabilities for identifying a “healthy core”. A set of homogeneous healthy controls can assist in overcoming some of the unwanted heterogeneity and allow for the development of classification models that have high accuracy when applied to new datasets. Extensive experimental results show the utilization of a healthy core. One dataset consists of 120 individuals (66 with migraine and 54 healthy controls) and another dataset consists of 76 (34 with migraine and 42 healthy controls) individuals. A homogeneous dataset derived from a cohort of healthy controls improves the performance of classification models by about 25% accuracy improvements for both episodic and chronic migraineurs. |
---|