Cargando…

Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods

BACKGROUND: Genotype–phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, an...

Descripción completa

Detalles Bibliográficos
Autores principales: Muneeb, Muhammad, Henschel, Andreas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8056510/
https://www.ncbi.nlm.nih.gov/pubmed/33874881
http://dx.doi.org/10.1186/s12859-021-04077-9
_version_ 1783680661708603392
author Muneeb, Muhammad
Henschel, Andreas
author_facet Muneeb, Muhammad
Henschel, Andreas
author_sort Muneeb, Muhammad
collection PubMed
description BACKGROUND: Genotype–phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, and machine learning. Statistical techniques are good for finding the actual SNPs causing variation where Machine Learning techniques are good where we just want to classify the people into different categories. In this article, we examined the Eye-color and Type-2 diabetes phenotype. The proposed technique is a hybrid approach consisting of some parts from statistical techniques and remaining from Machine learning. RESULTS: The main dataset for Eye-color phenotype consists of 806 people. 404 people have Blue-Green eyes where 402 people have Brown eyes. After preprocessing we generated 8 different datasets, containing different numbers of SNPs, using the mutation difference and thresholding at individual SNP. We calculated three types of mutation at each SNP no mutation, partial mutation, and full mutation. After that data is transformed for machine learning algorithms. We used about 9 classifiers, RandomForest, Extreme Gradient boosting, ANN, LSTM, GRU, BILSTM, 1DCNN, ensembles of ANN, and ensembles of LSTM which gave the best accuracy of 0.91, 0.9286, 0.945, 0.94, 0.94, 0.92, 0.95, and 0.96% respectively. Stacked ensembles of LSTM outperformed other algorithms for 1560 SNPs with an overall accuracy of 0.96, AUC = 0.98 for brown eyes, and AUC = 0.97 for Blue-Green eyes. The main dataset for Type-2 diabetes consists of 107 people where 30 people are classified as cases and 74 people as controls. We used different linear threshold to find the optimal number of SNPs for classification. The final model gave an accuracy of 0.97%. CONCLUSION: Genotype–phenotype predictions are very useful especially in forensic. These predictions can help to identify SNP variant association with traits and diseases. Given more datasets, machine learning model predictions can be increased. Moreover, the non-linearity in the Machine learning model and the combination of SNPs Mutations while training the model increases the prediction. We considered binary classification problems but the proposed approach can be extended to multi-class classification.
format Online
Article
Text
id pubmed-8056510
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80565102021-04-20 Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods Muneeb, Muhammad Henschel, Andreas BMC Bioinformatics Research Article BACKGROUND: Genotype–phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, and machine learning. Statistical techniques are good for finding the actual SNPs causing variation where Machine Learning techniques are good where we just want to classify the people into different categories. In this article, we examined the Eye-color and Type-2 diabetes phenotype. The proposed technique is a hybrid approach consisting of some parts from statistical techniques and remaining from Machine learning. RESULTS: The main dataset for Eye-color phenotype consists of 806 people. 404 people have Blue-Green eyes where 402 people have Brown eyes. After preprocessing we generated 8 different datasets, containing different numbers of SNPs, using the mutation difference and thresholding at individual SNP. We calculated three types of mutation at each SNP no mutation, partial mutation, and full mutation. After that data is transformed for machine learning algorithms. We used about 9 classifiers, RandomForest, Extreme Gradient boosting, ANN, LSTM, GRU, BILSTM, 1DCNN, ensembles of ANN, and ensembles of LSTM which gave the best accuracy of 0.91, 0.9286, 0.945, 0.94, 0.94, 0.92, 0.95, and 0.96% respectively. Stacked ensembles of LSTM outperformed other algorithms for 1560 SNPs with an overall accuracy of 0.96, AUC = 0.98 for brown eyes, and AUC = 0.97 for Blue-Green eyes. The main dataset for Type-2 diabetes consists of 107 people where 30 people are classified as cases and 74 people as controls. We used different linear threshold to find the optimal number of SNPs for classification. The final model gave an accuracy of 0.97%. CONCLUSION: Genotype–phenotype predictions are very useful especially in forensic. These predictions can help to identify SNP variant association with traits and diseases. Given more datasets, machine learning model predictions can be increased. Moreover, the non-linearity in the Machine learning model and the combination of SNPs Mutations while training the model increases the prediction. We considered binary classification problems but the proposed approach can be extended to multi-class classification. BioMed Central 2021-04-19 /pmc/articles/PMC8056510/ /pubmed/33874881 http://dx.doi.org/10.1186/s12859-021-04077-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Muneeb, Muhammad
Henschel, Andreas
Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods
title Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods
title_full Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods
title_fullStr Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods
title_full_unstemmed Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods
title_short Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods
title_sort eye-color and type-2 diabetes phenotype prediction from genotype data using deep learning methods
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8056510/
https://www.ncbi.nlm.nih.gov/pubmed/33874881
http://dx.doi.org/10.1186/s12859-021-04077-9
work_keys_str_mv AT muneebmuhammad eyecolorandtype2diabetesphenotypepredictionfromgenotypedatausingdeeplearningmethods
AT henschelandreas eyecolorandtype2diabetesphenotypepredictionfromgenotypedatausingdeeplearningmethods