Cargando…

Analysis of Half a Billion Datapoints Across Ten Machine-Learning Algorithms Identifies Key Elements Associated With Insulin Transcription in Human Pancreatic Islet Cells

Machine learning (ML)-workflows enable unprejudiced/robust evaluation of complex datasets. Here, we analyzed over 490,000,000 data points to compare 10 different ML-workflows in a large (N=11,652) training dataset of human pancreatic single-cell (sc-)transcriptomes to identify genes associated with...

Descripción completa

Detalles Bibliográficos
Autores principales: Wong, Wilson K. M., Thorat, Vinod, Joglekar, Mugdha V., Dong, Charlotte X., Lee, Hugo, Chew, Yi Vee, Bhave, Adwait, Hawthorne, Wayne J., Engin, Feyza, Pant, Aniruddha, Dalgaard, Louise T., Bapat, Sharda, Hardikar, Anandwardhan A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8986156/
https://www.ncbi.nlm.nih.gov/pubmed/35399953
http://dx.doi.org/10.3389/fendo.2022.853863
_version_ 1784682490139508736
author Wong, Wilson K. M.
Thorat, Vinod
Joglekar, Mugdha V.
Dong, Charlotte X.
Lee, Hugo
Chew, Yi Vee
Bhave, Adwait
Hawthorne, Wayne J.
Engin, Feyza
Pant, Aniruddha
Dalgaard, Louise T.
Bapat, Sharda
Hardikar, Anandwardhan A.
author_facet Wong, Wilson K. M.
Thorat, Vinod
Joglekar, Mugdha V.
Dong, Charlotte X.
Lee, Hugo
Chew, Yi Vee
Bhave, Adwait
Hawthorne, Wayne J.
Engin, Feyza
Pant, Aniruddha
Dalgaard, Louise T.
Bapat, Sharda
Hardikar, Anandwardhan A.
author_sort Wong, Wilson K. M.
collection PubMed
description Machine learning (ML)-workflows enable unprejudiced/robust evaluation of complex datasets. Here, we analyzed over 490,000,000 data points to compare 10 different ML-workflows in a large (N=11,652) training dataset of human pancreatic single-cell (sc-)transcriptomes to identify genes associated with the presence or absence of insulin transcript(s). Prediction accuracy/sensitivity of each ML-workflow was tested in a separate validation dataset (N=2,913). Ensemble ML-workflows, in particular Random Forest ML-algorithm delivered high predictive power (AUC=0.83) and sensitivity (0.98), compared to other algorithms. The transcripts identified through these analyses also demonstrated significant correlation with insulin in bulk RNA-seq data from human islets. The top-10 features, (including IAPP, ADCYAP1, LDHA and SST) common to the three Ensemble ML-workflows were significantly dysregulated in scRNA-seq datasets from Ire-1α(β-/-) mice that demonstrate dedifferentiation of pancreatic β-cells in a model of type 1 diabetes (T1D) and in pancreatic single cells from individuals with type 2 Diabetes (T2D). Our findings provide direct comparison of ML-workflows in big data analyses, identify key elements associated with insulin transcription and provide workflows for future analyses.
format Online
Article
Text
id pubmed-8986156
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-89861562022-04-07 Analysis of Half a Billion Datapoints Across Ten Machine-Learning Algorithms Identifies Key Elements Associated With Insulin Transcription in Human Pancreatic Islet Cells Wong, Wilson K. M. Thorat, Vinod Joglekar, Mugdha V. Dong, Charlotte X. Lee, Hugo Chew, Yi Vee Bhave, Adwait Hawthorne, Wayne J. Engin, Feyza Pant, Aniruddha Dalgaard, Louise T. Bapat, Sharda Hardikar, Anandwardhan A. Front Endocrinol (Lausanne) Endocrinology Machine learning (ML)-workflows enable unprejudiced/robust evaluation of complex datasets. Here, we analyzed over 490,000,000 data points to compare 10 different ML-workflows in a large (N=11,652) training dataset of human pancreatic single-cell (sc-)transcriptomes to identify genes associated with the presence or absence of insulin transcript(s). Prediction accuracy/sensitivity of each ML-workflow was tested in a separate validation dataset (N=2,913). Ensemble ML-workflows, in particular Random Forest ML-algorithm delivered high predictive power (AUC=0.83) and sensitivity (0.98), compared to other algorithms. The transcripts identified through these analyses also demonstrated significant correlation with insulin in bulk RNA-seq data from human islets. The top-10 features, (including IAPP, ADCYAP1, LDHA and SST) common to the three Ensemble ML-workflows were significantly dysregulated in scRNA-seq datasets from Ire-1α(β-/-) mice that demonstrate dedifferentiation of pancreatic β-cells in a model of type 1 diabetes (T1D) and in pancreatic single cells from individuals with type 2 Diabetes (T2D). Our findings provide direct comparison of ML-workflows in big data analyses, identify key elements associated with insulin transcription and provide workflows for future analyses. Frontiers Media S.A. 2022-03-23 /pmc/articles/PMC8986156/ /pubmed/35399953 http://dx.doi.org/10.3389/fendo.2022.853863 Text en Copyright © 2022 Wong, Thorat, Joglekar, Dong, Lee, Chew, Bhave, Hawthorne, Engin, Pant, Dalgaard, Bapat and Hardikar https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Endocrinology
Wong, Wilson K. M.
Thorat, Vinod
Joglekar, Mugdha V.
Dong, Charlotte X.
Lee, Hugo
Chew, Yi Vee
Bhave, Adwait
Hawthorne, Wayne J.
Engin, Feyza
Pant, Aniruddha
Dalgaard, Louise T.
Bapat, Sharda
Hardikar, Anandwardhan A.
Analysis of Half a Billion Datapoints Across Ten Machine-Learning Algorithms Identifies Key Elements Associated With Insulin Transcription in Human Pancreatic Islet Cells
title Analysis of Half a Billion Datapoints Across Ten Machine-Learning Algorithms Identifies Key Elements Associated With Insulin Transcription in Human Pancreatic Islet Cells
title_full Analysis of Half a Billion Datapoints Across Ten Machine-Learning Algorithms Identifies Key Elements Associated With Insulin Transcription in Human Pancreatic Islet Cells
title_fullStr Analysis of Half a Billion Datapoints Across Ten Machine-Learning Algorithms Identifies Key Elements Associated With Insulin Transcription in Human Pancreatic Islet Cells
title_full_unstemmed Analysis of Half a Billion Datapoints Across Ten Machine-Learning Algorithms Identifies Key Elements Associated With Insulin Transcription in Human Pancreatic Islet Cells
title_short Analysis of Half a Billion Datapoints Across Ten Machine-Learning Algorithms Identifies Key Elements Associated With Insulin Transcription in Human Pancreatic Islet Cells
title_sort analysis of half a billion datapoints across ten machine-learning algorithms identifies key elements associated with insulin transcription in human pancreatic islet cells
topic Endocrinology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8986156/
https://www.ncbi.nlm.nih.gov/pubmed/35399953
http://dx.doi.org/10.3389/fendo.2022.853863
work_keys_str_mv AT wongwilsonkm analysisofhalfabilliondatapointsacrosstenmachinelearningalgorithmsidentifieskeyelementsassociatedwithinsulintranscriptioninhumanpancreaticisletcells
AT thoratvinod analysisofhalfabilliondatapointsacrosstenmachinelearningalgorithmsidentifieskeyelementsassociatedwithinsulintranscriptioninhumanpancreaticisletcells
AT joglekarmugdhav analysisofhalfabilliondatapointsacrosstenmachinelearningalgorithmsidentifieskeyelementsassociatedwithinsulintranscriptioninhumanpancreaticisletcells
AT dongcharlottex analysisofhalfabilliondatapointsacrosstenmachinelearningalgorithmsidentifieskeyelementsassociatedwithinsulintranscriptioninhumanpancreaticisletcells
AT leehugo analysisofhalfabilliondatapointsacrosstenmachinelearningalgorithmsidentifieskeyelementsassociatedwithinsulintranscriptioninhumanpancreaticisletcells
AT chewyivee analysisofhalfabilliondatapointsacrosstenmachinelearningalgorithmsidentifieskeyelementsassociatedwithinsulintranscriptioninhumanpancreaticisletcells
AT bhaveadwait analysisofhalfabilliondatapointsacrosstenmachinelearningalgorithmsidentifieskeyelementsassociatedwithinsulintranscriptioninhumanpancreaticisletcells
AT hawthornewaynej analysisofhalfabilliondatapointsacrosstenmachinelearningalgorithmsidentifieskeyelementsassociatedwithinsulintranscriptioninhumanpancreaticisletcells
AT enginfeyza analysisofhalfabilliondatapointsacrosstenmachinelearningalgorithmsidentifieskeyelementsassociatedwithinsulintranscriptioninhumanpancreaticisletcells
AT pantaniruddha analysisofhalfabilliondatapointsacrosstenmachinelearningalgorithmsidentifieskeyelementsassociatedwithinsulintranscriptioninhumanpancreaticisletcells
AT dalgaardlouiset analysisofhalfabilliondatapointsacrosstenmachinelearningalgorithmsidentifieskeyelementsassociatedwithinsulintranscriptioninhumanpancreaticisletcells
AT bapatsharda analysisofhalfabilliondatapointsacrosstenmachinelearningalgorithmsidentifieskeyelementsassociatedwithinsulintranscriptioninhumanpancreaticisletcells
AT hardikaranandwardhana analysisofhalfabilliondatapointsacrosstenmachinelearningalgorithmsidentifieskeyelementsassociatedwithinsulintranscriptioninhumanpancreaticisletcells