Cargando…

Machine learning derived risk prediction of anorexia nervosa

BACKGROUND: Anorexia nervosa (AN) is a complex psychiatric disease with a moderate to strong genetic contribution. In addition to conventional genome wide association (GWA) studies, researchers have been using machine learning methods in conjunction with genomic data to predict risk of diseases in w...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Yiran, Wei, Zhi, Keating, Brendan J., Hakonarson, Hakon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4721143/
https://www.ncbi.nlm.nih.gov/pubmed/26792494
http://dx.doi.org/10.1186/s12920-016-0165-x
_version_ 1782411184151986176
author Guo, Yiran
Wei, Zhi
Keating, Brendan J.
Hakonarson, Hakon
author_facet Guo, Yiran
Wei, Zhi
Keating, Brendan J.
Hakonarson, Hakon
author_sort Guo, Yiran
collection PubMed
description BACKGROUND: Anorexia nervosa (AN) is a complex psychiatric disease with a moderate to strong genetic contribution. In addition to conventional genome wide association (GWA) studies, researchers have been using machine learning methods in conjunction with genomic data to predict risk of diseases in which genetics play an important role. METHODS: In this study, we collected whole genome genotyping data on 3940 AN cases and 9266 controls from the Genetic Consortium for Anorexia Nervosa (GCAN), the Wellcome Trust Case Control Consortium 3 (WTCCC3), Price Foundation Collaborative Group and the Children’s Hospital of Philadelphia (CHOP), and applied machine learning methods for predicting AN disease risk. The prediction performance is measured by area under the receiver operating characteristic curve (AUC), indicating how well the model distinguishes cases from unaffected control subjects. RESULTS: Logistic regression model with the lasso penalty technique generated an AUC of 0.693, while Support Vector Machines and Gradient Boosted Trees reached AUC’s of 0.691 and 0.623, respectively. Using different sample sizes, our results suggest that larger datasets are required to optimize the machine learning models and achieve higher AUC values. CONCLUSIONS: To our knowledge, this is the first attempt to assess AN risk based on genome wide genotype level data. Future integration of genomic, environmental and family-based information is likely to improve the AN risk evaluation process, eventually benefitting AN patients and families in the clinical setting. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12920-016-0165-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4721143
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47211432016-01-22 Machine learning derived risk prediction of anorexia nervosa Guo, Yiran Wei, Zhi Keating, Brendan J. Hakonarson, Hakon BMC Med Genomics Research Article BACKGROUND: Anorexia nervosa (AN) is a complex psychiatric disease with a moderate to strong genetic contribution. In addition to conventional genome wide association (GWA) studies, researchers have been using machine learning methods in conjunction with genomic data to predict risk of diseases in which genetics play an important role. METHODS: In this study, we collected whole genome genotyping data on 3940 AN cases and 9266 controls from the Genetic Consortium for Anorexia Nervosa (GCAN), the Wellcome Trust Case Control Consortium 3 (WTCCC3), Price Foundation Collaborative Group and the Children’s Hospital of Philadelphia (CHOP), and applied machine learning methods for predicting AN disease risk. The prediction performance is measured by area under the receiver operating characteristic curve (AUC), indicating how well the model distinguishes cases from unaffected control subjects. RESULTS: Logistic regression model with the lasso penalty technique generated an AUC of 0.693, while Support Vector Machines and Gradient Boosted Trees reached AUC’s of 0.691 and 0.623, respectively. Using different sample sizes, our results suggest that larger datasets are required to optimize the machine learning models and achieve higher AUC values. CONCLUSIONS: To our knowledge, this is the first attempt to assess AN risk based on genome wide genotype level data. Future integration of genomic, environmental and family-based information is likely to improve the AN risk evaluation process, eventually benefitting AN patients and families in the clinical setting. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12920-016-0165-x) contains supplementary material, which is available to authorized users. BioMed Central 2016-01-20 /pmc/articles/PMC4721143/ /pubmed/26792494 http://dx.doi.org/10.1186/s12920-016-0165-x Text en © Guo et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Guo, Yiran
Wei, Zhi
Keating, Brendan J.
Hakonarson, Hakon
Machine learning derived risk prediction of anorexia nervosa
title Machine learning derived risk prediction of anorexia nervosa
title_full Machine learning derived risk prediction of anorexia nervosa
title_fullStr Machine learning derived risk prediction of anorexia nervosa
title_full_unstemmed Machine learning derived risk prediction of anorexia nervosa
title_short Machine learning derived risk prediction of anorexia nervosa
title_sort machine learning derived risk prediction of anorexia nervosa
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4721143/
https://www.ncbi.nlm.nih.gov/pubmed/26792494
http://dx.doi.org/10.1186/s12920-016-0165-x
work_keys_str_mv AT guoyiran machinelearningderivedriskpredictionofanorexianervosa
AT weizhi machinelearningderivedriskpredictionofanorexianervosa
AT keatingbrendanj machinelearningderivedriskpredictionofanorexianervosa
AT machinelearningderivedriskpredictionofanorexianervosa
AT machinelearningderivedriskpredictionofanorexianervosa
AT machinelearningderivedriskpredictionofanorexianervosa
AT hakonarsonhakon machinelearningderivedriskpredictionofanorexianervosa