Cargando…

Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches

Smoking is a complex behavior with a heritability as high as 50%. Given such a large genetic contribution, it provides an opportunity to prevent those individuals who are susceptible to smoking dependence from ever starting to smoke by predicting their inherited predisposition with their genomic pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Yi, Cao, Liyu, Zhao, Xinyi, Yao, Yinghao, Liu, Qiang, Zhang, Bin, Wang, Yan, Mao, Ying, Ma, Yunlong, Ma, Jennie Z., Payne, Thomas J., Li, Ming D., Li, Lanjuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7241440/
https://www.ncbi.nlm.nih.gov/pubmed/32477189
http://dx.doi.org/10.3389/fpsyt.2020.00416
_version_ 1783537069411270656
author Xu, Yi
Cao, Liyu
Zhao, Xinyi
Yao, Yinghao
Liu, Qiang
Zhang, Bin
Wang, Yan
Mao, Ying
Ma, Yunlong
Ma, Jennie Z.
Payne, Thomas J.
Li, Ming D.
Li, Lanjuan
author_facet Xu, Yi
Cao, Liyu
Zhao, Xinyi
Yao, Yinghao
Liu, Qiang
Zhang, Bin
Wang, Yan
Mao, Ying
Ma, Yunlong
Ma, Jennie Z.
Payne, Thomas J.
Li, Ming D.
Li, Lanjuan
author_sort Xu, Yi
collection PubMed
description Smoking is a complex behavior with a heritability as high as 50%. Given such a large genetic contribution, it provides an opportunity to prevent those individuals who are susceptible to smoking dependence from ever starting to smoke by predicting their inherited predisposition with their genomic profiles. Although previous studies have identified many susceptibility variants for smoking, they have limited power to predict smoking behavior. We applied the support vector machine (SVM) and random forest (RF) methods to build prediction models for smoking behavior. We first used 1,431 smokers and 1,503 non-smokers of African origin for model building with a 10-fold cross-validation and then tested the prediction models on an independent dataset consisting of 213 smokers and 224 non-smokers. The SVM model with 500 top single nucleotide polymorphisms (SNPs) selected using logistic regression (p<0.01) as the feature selection method achieved an area under the curve (AUC) of 0.691, 0.721, and 0.720 for the training, test, and independent test samples, respectively. The RF model with 500 top SNPs selected using logistic regression (p<0.01) achieved AUCs of 0.671, 0.665, and 0.667 for the training, test, and independent test samples, respectively. Finally, we used the combined logistic (p<0.01) and LASSO (λ=10(−3)) regression to select features and the SVM algorithm for model building. The SVM model with 500 top SNPs achieved AUCs of 0.756, 0.776, and 0.897 for the training, test, and independent test samples, respectively. We conclude that machine learning methods are promising means to build predictive models for smoking.
format Online
Article
Text
id pubmed-7241440
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-72414402020-05-29 Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches Xu, Yi Cao, Liyu Zhao, Xinyi Yao, Yinghao Liu, Qiang Zhang, Bin Wang, Yan Mao, Ying Ma, Yunlong Ma, Jennie Z. Payne, Thomas J. Li, Ming D. Li, Lanjuan Front Psychiatry Psychiatry Smoking is a complex behavior with a heritability as high as 50%. Given such a large genetic contribution, it provides an opportunity to prevent those individuals who are susceptible to smoking dependence from ever starting to smoke by predicting their inherited predisposition with their genomic profiles. Although previous studies have identified many susceptibility variants for smoking, they have limited power to predict smoking behavior. We applied the support vector machine (SVM) and random forest (RF) methods to build prediction models for smoking behavior. We first used 1,431 smokers and 1,503 non-smokers of African origin for model building with a 10-fold cross-validation and then tested the prediction models on an independent dataset consisting of 213 smokers and 224 non-smokers. The SVM model with 500 top single nucleotide polymorphisms (SNPs) selected using logistic regression (p<0.01) as the feature selection method achieved an area under the curve (AUC) of 0.691, 0.721, and 0.720 for the training, test, and independent test samples, respectively. The RF model with 500 top SNPs selected using logistic regression (p<0.01) achieved AUCs of 0.671, 0.665, and 0.667 for the training, test, and independent test samples, respectively. Finally, we used the combined logistic (p<0.01) and LASSO (λ=10(−3)) regression to select features and the SVM algorithm for model building. The SVM model with 500 top SNPs achieved AUCs of 0.756, 0.776, and 0.897 for the training, test, and independent test samples, respectively. We conclude that machine learning methods are promising means to build predictive models for smoking. Frontiers Media S.A. 2020-05-14 /pmc/articles/PMC7241440/ /pubmed/32477189 http://dx.doi.org/10.3389/fpsyt.2020.00416 Text en Copyright © 2020 Xu, Cao, Zhao, Yao, Liu, Zhang, Wang, Mao, Ma, Ma, Payne, Li and Li http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Psychiatry
Xu, Yi
Cao, Liyu
Zhao, Xinyi
Yao, Yinghao
Liu, Qiang
Zhang, Bin
Wang, Yan
Mao, Ying
Ma, Yunlong
Ma, Jennie Z.
Payne, Thomas J.
Li, Ming D.
Li, Lanjuan
Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches
title Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches
title_full Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches
title_fullStr Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches
title_full_unstemmed Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches
title_short Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches
title_sort prediction of smoking behavior from single nucleotide polymorphisms with machine learning approaches
topic Psychiatry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7241440/
https://www.ncbi.nlm.nih.gov/pubmed/32477189
http://dx.doi.org/10.3389/fpsyt.2020.00416
work_keys_str_mv AT xuyi predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches
AT caoliyu predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches
AT zhaoxinyi predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches
AT yaoyinghao predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches
AT liuqiang predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches
AT zhangbin predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches
AT wangyan predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches
AT maoying predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches
AT mayunlong predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches
AT majenniez predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches
AT paynethomasj predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches
AT limingd predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches
AT lilanjuan predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches