Cargando…

Machine learning in medicine: a practical introduction

BACKGROUND: Following visible successes on a wide range of predictive tasks, machine learning techniques are attracting substantial interest from medical researchers and clinicians. We address the need for capacity development in this area by providing a conceptual introduction to machine learning a...

Descripción completa

Detalles Bibliográficos
Autores principales: Sidey-Gibbons, Jenni A. M., Sidey-Gibbons, Chris J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6425557/
https://www.ncbi.nlm.nih.gov/pubmed/30890124
http://dx.doi.org/10.1186/s12874-019-0681-4
_version_ 1783404859582578688
author Sidey-Gibbons, Jenni A. M.
Sidey-Gibbons, Chris J.
author_facet Sidey-Gibbons, Jenni A. M.
Sidey-Gibbons, Chris J.
author_sort Sidey-Gibbons, Jenni A. M.
collection PubMed
description BACKGROUND: Following visible successes on a wide range of predictive tasks, machine learning techniques are attracting substantial interest from medical researchers and clinicians. We address the need for capacity development in this area by providing a conceptual introduction to machine learning alongside a practical guide to developing and evaluating predictive algorithms using freely-available open source software and public domain data. METHODS: We demonstrate the use of machine learning techniques by developing three predictive models for cancer diagnosis using descriptions of nuclei sampled from breast masses. These algorithms include regularized General Linear Model regression (GLMs), Support Vector Machines (SVMs) with a radial basis function kernel, and single-layer Artificial Neural Networks. The publicly-available dataset describing the breast mass samples (N=683) was randomly split into evaluation (n=456) and validation (n=227) samples. We trained algorithms on data from the evaluation sample before they were used to predict the diagnostic outcome in the validation dataset. We compared the predictions made on the validation datasets with the real-world diagnostic decisions to calculate the accuracy, sensitivity, and specificity of the three models. We explored the use of averaging and voting ensembles to improve predictive performance. We provide a step-by-step guide to developing algorithms using the open-source R statistical programming environment. RESULTS: The trained algorithms were able to classify cell nuclei with high accuracy (.94 -.96), sensitivity (.97 -.99), and specificity (.85 -.94). Maximum accuracy (.96) and area under the curve (.97) was achieved using the SVM algorithm. Prediction performance increased marginally (accuracy =.97, sensitivity =.99, specificity =.95) when algorithms were arranged into a voting ensemble. CONCLUSIONS: We use a straightforward example to demonstrate the theory and practice of machine learning for clinicians and medical researchers. The principals which we demonstrate here can be readily applied to other complex tasks including natural language processing and image recognition. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12874-019-0681-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6425557
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64255572019-03-29 Machine learning in medicine: a practical introduction Sidey-Gibbons, Jenni A. M. Sidey-Gibbons, Chris J. BMC Med Res Methodol Research Article BACKGROUND: Following visible successes on a wide range of predictive tasks, machine learning techniques are attracting substantial interest from medical researchers and clinicians. We address the need for capacity development in this area by providing a conceptual introduction to machine learning alongside a practical guide to developing and evaluating predictive algorithms using freely-available open source software and public domain data. METHODS: We demonstrate the use of machine learning techniques by developing three predictive models for cancer diagnosis using descriptions of nuclei sampled from breast masses. These algorithms include regularized General Linear Model regression (GLMs), Support Vector Machines (SVMs) with a radial basis function kernel, and single-layer Artificial Neural Networks. The publicly-available dataset describing the breast mass samples (N=683) was randomly split into evaluation (n=456) and validation (n=227) samples. We trained algorithms on data from the evaluation sample before they were used to predict the diagnostic outcome in the validation dataset. We compared the predictions made on the validation datasets with the real-world diagnostic decisions to calculate the accuracy, sensitivity, and specificity of the three models. We explored the use of averaging and voting ensembles to improve predictive performance. We provide a step-by-step guide to developing algorithms using the open-source R statistical programming environment. RESULTS: The trained algorithms were able to classify cell nuclei with high accuracy (.94 -.96), sensitivity (.97 -.99), and specificity (.85 -.94). Maximum accuracy (.96) and area under the curve (.97) was achieved using the SVM algorithm. Prediction performance increased marginally (accuracy =.97, sensitivity =.99, specificity =.95) when algorithms were arranged into a voting ensemble. CONCLUSIONS: We use a straightforward example to demonstrate the theory and practice of machine learning for clinicians and medical researchers. The principals which we demonstrate here can be readily applied to other complex tasks including natural language processing and image recognition. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12874-019-0681-4) contains supplementary material, which is available to authorized users. BioMed Central 2019-03-19 /pmc/articles/PMC6425557/ /pubmed/30890124 http://dx.doi.org/10.1186/s12874-019-0681-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Sidey-Gibbons, Jenni A. M.
Sidey-Gibbons, Chris J.
Machine learning in medicine: a practical introduction
title Machine learning in medicine: a practical introduction
title_full Machine learning in medicine: a practical introduction
title_fullStr Machine learning in medicine: a practical introduction
title_full_unstemmed Machine learning in medicine: a practical introduction
title_short Machine learning in medicine: a practical introduction
title_sort machine learning in medicine: a practical introduction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6425557/
https://www.ncbi.nlm.nih.gov/pubmed/30890124
http://dx.doi.org/10.1186/s12874-019-0681-4
work_keys_str_mv AT sideygibbonsjenniam machinelearninginmedicineapracticalintroduction
AT sideygibbonschrisj machinelearninginmedicineapracticalintroduction