Cargando…

Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data

Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised...

Descripción completa

Detalles Bibliográficos
Autores principales: Held, Elizabeth, Cape, Joshua, Tintle, Nathan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133520/
https://www.ncbi.nlm.nih.gov/pubmed/27980626
http://dx.doi.org/10.1186/s12919-016-0020-2
_version_ 1782471279929982976
author Held, Elizabeth
Cape, Joshua
Tintle, Nathan
author_facet Held, Elizabeth
Cape, Joshua
Tintle, Nathan
author_sort Held, Elizabeth
collection PubMed
description Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
format Online
Article
Text
id pubmed-5133520
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51335202016-12-15 Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data Held, Elizabeth Cape, Joshua Tintle, Nathan BMC Proc Proceedings Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data. BioMed Central 2016-10-18 /pmc/articles/PMC5133520/ /pubmed/27980626 http://dx.doi.org/10.1186/s12919-016-0020-2 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Held, Elizabeth
Cape, Joshua
Tintle, Nathan
Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data
title Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data
title_full Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data
title_fullStr Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data
title_full_unstemmed Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data
title_short Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data
title_sort comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133520/
https://www.ncbi.nlm.nih.gov/pubmed/27980626
http://dx.doi.org/10.1186/s12919-016-0020-2
work_keys_str_mv AT heldelizabeth comparingmachinelearningandlogisticregressionmethodsforpredictinghypertensionusingacombinationofgeneexpressionandnextgenerationsequencingdata
AT capejoshua comparingmachinelearningandlogisticregressionmethodsforpredictinghypertensionusingacombinationofgeneexpressionandnextgenerationsequencingdata
AT tintlenathan comparingmachinelearningandlogisticregressionmethodsforpredictinghypertensionusingacombinationofgeneexpressionandnextgenerationsequencingdata