Cargando…
Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches
Smoking is a complex behavior with a heritability as high as 50%. Given such a large genetic contribution, it provides an opportunity to prevent those individuals who are susceptible to smoking dependence from ever starting to smoke by predicting their inherited predisposition with their genomic pro...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7241440/ https://www.ncbi.nlm.nih.gov/pubmed/32477189 http://dx.doi.org/10.3389/fpsyt.2020.00416 |
_version_ | 1783537069411270656 |
---|---|
author | Xu, Yi Cao, Liyu Zhao, Xinyi Yao, Yinghao Liu, Qiang Zhang, Bin Wang, Yan Mao, Ying Ma, Yunlong Ma, Jennie Z. Payne, Thomas J. Li, Ming D. Li, Lanjuan |
author_facet | Xu, Yi Cao, Liyu Zhao, Xinyi Yao, Yinghao Liu, Qiang Zhang, Bin Wang, Yan Mao, Ying Ma, Yunlong Ma, Jennie Z. Payne, Thomas J. Li, Ming D. Li, Lanjuan |
author_sort | Xu, Yi |
collection | PubMed |
description | Smoking is a complex behavior with a heritability as high as 50%. Given such a large genetic contribution, it provides an opportunity to prevent those individuals who are susceptible to smoking dependence from ever starting to smoke by predicting their inherited predisposition with their genomic profiles. Although previous studies have identified many susceptibility variants for smoking, they have limited power to predict smoking behavior. We applied the support vector machine (SVM) and random forest (RF) methods to build prediction models for smoking behavior. We first used 1,431 smokers and 1,503 non-smokers of African origin for model building with a 10-fold cross-validation and then tested the prediction models on an independent dataset consisting of 213 smokers and 224 non-smokers. The SVM model with 500 top single nucleotide polymorphisms (SNPs) selected using logistic regression (p<0.01) as the feature selection method achieved an area under the curve (AUC) of 0.691, 0.721, and 0.720 for the training, test, and independent test samples, respectively. The RF model with 500 top SNPs selected using logistic regression (p<0.01) achieved AUCs of 0.671, 0.665, and 0.667 for the training, test, and independent test samples, respectively. Finally, we used the combined logistic (p<0.01) and LASSO (λ=10(−3)) regression to select features and the SVM algorithm for model building. The SVM model with 500 top SNPs achieved AUCs of 0.756, 0.776, and 0.897 for the training, test, and independent test samples, respectively. We conclude that machine learning methods are promising means to build predictive models for smoking. |
format | Online Article Text |
id | pubmed-7241440 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-72414402020-05-29 Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches Xu, Yi Cao, Liyu Zhao, Xinyi Yao, Yinghao Liu, Qiang Zhang, Bin Wang, Yan Mao, Ying Ma, Yunlong Ma, Jennie Z. Payne, Thomas J. Li, Ming D. Li, Lanjuan Front Psychiatry Psychiatry Smoking is a complex behavior with a heritability as high as 50%. Given such a large genetic contribution, it provides an opportunity to prevent those individuals who are susceptible to smoking dependence from ever starting to smoke by predicting their inherited predisposition with their genomic profiles. Although previous studies have identified many susceptibility variants for smoking, they have limited power to predict smoking behavior. We applied the support vector machine (SVM) and random forest (RF) methods to build prediction models for smoking behavior. We first used 1,431 smokers and 1,503 non-smokers of African origin for model building with a 10-fold cross-validation and then tested the prediction models on an independent dataset consisting of 213 smokers and 224 non-smokers. The SVM model with 500 top single nucleotide polymorphisms (SNPs) selected using logistic regression (p<0.01) as the feature selection method achieved an area under the curve (AUC) of 0.691, 0.721, and 0.720 for the training, test, and independent test samples, respectively. The RF model with 500 top SNPs selected using logistic regression (p<0.01) achieved AUCs of 0.671, 0.665, and 0.667 for the training, test, and independent test samples, respectively. Finally, we used the combined logistic (p<0.01) and LASSO (λ=10(−3)) regression to select features and the SVM algorithm for model building. The SVM model with 500 top SNPs achieved AUCs of 0.756, 0.776, and 0.897 for the training, test, and independent test samples, respectively. We conclude that machine learning methods are promising means to build predictive models for smoking. Frontiers Media S.A. 2020-05-14 /pmc/articles/PMC7241440/ /pubmed/32477189 http://dx.doi.org/10.3389/fpsyt.2020.00416 Text en Copyright © 2020 Xu, Cao, Zhao, Yao, Liu, Zhang, Wang, Mao, Ma, Ma, Payne, Li and Li http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Psychiatry Xu, Yi Cao, Liyu Zhao, Xinyi Yao, Yinghao Liu, Qiang Zhang, Bin Wang, Yan Mao, Ying Ma, Yunlong Ma, Jennie Z. Payne, Thomas J. Li, Ming D. Li, Lanjuan Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches |
title | Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches |
title_full | Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches |
title_fullStr | Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches |
title_full_unstemmed | Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches |
title_short | Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches |
title_sort | prediction of smoking behavior from single nucleotide polymorphisms with machine learning approaches |
topic | Psychiatry |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7241440/ https://www.ncbi.nlm.nih.gov/pubmed/32477189 http://dx.doi.org/10.3389/fpsyt.2020.00416 |
work_keys_str_mv | AT xuyi predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches AT caoliyu predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches AT zhaoxinyi predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches AT yaoyinghao predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches AT liuqiang predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches AT zhangbin predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches AT wangyan predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches AT maoying predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches AT mayunlong predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches AT majenniez predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches AT paynethomasj predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches AT limingd predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches AT lilanjuan predictionofsmokingbehaviorfromsinglenucleotidepolymorphismswithmachinelearningapproaches |