Cargando…
Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research
Research in environmental health is becoming increasingly reliant upon data science and computational methods that can more efficiently extract information from complex datasets. Data science and computational methods can be leveraged to better identify relationships between exposures to stressors i...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9257219/ https://www.ncbi.nlm.nih.gov/pubmed/35812168 http://dx.doi.org/10.3389/ftox.2022.893924 |
_version_ | 1784741295795732480 |
---|---|
author | Roell, Kyle Koval, Lauren E. Boyles, Rebecca Patlewicz, Grace Ring, Caroline Rider, Cynthia V. Ward-Caviness, Cavin Reif, David M. Jaspers, Ilona Fry, Rebecca C. Rager, Julia E. |
author_facet | Roell, Kyle Koval, Lauren E. Boyles, Rebecca Patlewicz, Grace Ring, Caroline Rider, Cynthia V. Ward-Caviness, Cavin Reif, David M. Jaspers, Ilona Fry, Rebecca C. Rager, Julia E. |
author_sort | Roell, Kyle |
collection | PubMed |
description | Research in environmental health is becoming increasingly reliant upon data science and computational methods that can more efficiently extract information from complex datasets. Data science and computational methods can be leveraged to better identify relationships between exposures to stressors in the environment and human disease outcomes, representing critical information needed to protect and improve global public health. Still, there remains a critical gap surrounding the training of researchers on these in silico methods. We aimed to address this gap by developing the inTelligence And Machine lEarning (TAME) Toolkit, promoting trainee-driven data generation, management, and analysis methods to “TAME” data in environmental health studies. Training modules were developed to provide applications-driven examples of data organization and analysis methods that can be used to address environmental health questions. Target audiences for these modules include students, post-baccalaureate and post-doctorate trainees, and professionals that are interested in expanding their skillset to include recent advances in data analysis methods relevant to environmental health, toxicology, exposure science, epidemiology, and bioinformatics/cheminformatics. Modules were developed by study coauthors using annotated script and were organized into three chapters within a GitHub Bookdown site. The first chapter of modules focuses on introductory data science, which includes the following topics: setting up R/RStudio and coding in the R environment; data organization basics; finding and visualizing data trends; high-dimensional data visualizations; and Findability, Accessibility, Interoperability, and Reusability (FAIR) data management practices. The second chapter of modules incorporates chemical-biological analyses and predictive modeling, spanning the following methods: dose-response modeling; machine learning and predictive modeling; mixtures analyses; -omics analyses; toxicokinetic modeling; and read-across toxicity predictions. The last chapter of modules was organized to provide examples on environmental health database mining and integration, including chemical exposure, health outcome, and environmental justice indicators. Training modules and associated data are publicly available online (https://uncsrp.github.io/Data-Analysis-Training-Modules/). Together, this resource provides unique opportunities to obtain introductory-level training on current data analysis methods applicable to 21st century science and environmental health. |
format | Online Article Text |
id | pubmed-9257219 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-92572192022-07-07 Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research Roell, Kyle Koval, Lauren E. Boyles, Rebecca Patlewicz, Grace Ring, Caroline Rider, Cynthia V. Ward-Caviness, Cavin Reif, David M. Jaspers, Ilona Fry, Rebecca C. Rager, Julia E. Front Toxicol Toxicology Research in environmental health is becoming increasingly reliant upon data science and computational methods that can more efficiently extract information from complex datasets. Data science and computational methods can be leveraged to better identify relationships between exposures to stressors in the environment and human disease outcomes, representing critical information needed to protect and improve global public health. Still, there remains a critical gap surrounding the training of researchers on these in silico methods. We aimed to address this gap by developing the inTelligence And Machine lEarning (TAME) Toolkit, promoting trainee-driven data generation, management, and analysis methods to “TAME” data in environmental health studies. Training modules were developed to provide applications-driven examples of data organization and analysis methods that can be used to address environmental health questions. Target audiences for these modules include students, post-baccalaureate and post-doctorate trainees, and professionals that are interested in expanding their skillset to include recent advances in data analysis methods relevant to environmental health, toxicology, exposure science, epidemiology, and bioinformatics/cheminformatics. Modules were developed by study coauthors using annotated script and were organized into three chapters within a GitHub Bookdown site. The first chapter of modules focuses on introductory data science, which includes the following topics: setting up R/RStudio and coding in the R environment; data organization basics; finding and visualizing data trends; high-dimensional data visualizations; and Findability, Accessibility, Interoperability, and Reusability (FAIR) data management practices. The second chapter of modules incorporates chemical-biological analyses and predictive modeling, spanning the following methods: dose-response modeling; machine learning and predictive modeling; mixtures analyses; -omics analyses; toxicokinetic modeling; and read-across toxicity predictions. The last chapter of modules was organized to provide examples on environmental health database mining and integration, including chemical exposure, health outcome, and environmental justice indicators. Training modules and associated data are publicly available online (https://uncsrp.github.io/Data-Analysis-Training-Modules/). Together, this resource provides unique opportunities to obtain introductory-level training on current data analysis methods applicable to 21st century science and environmental health. Frontiers Media S.A. 2022-06-22 /pmc/articles/PMC9257219/ /pubmed/35812168 http://dx.doi.org/10.3389/ftox.2022.893924 Text en Copyright © 2022 Roell, Koval, Boyles, Patlewicz, Ring, Rider, Ward-Caviness, Reif, Jaspers, Fry and Rager. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Toxicology Roell, Kyle Koval, Lauren E. Boyles, Rebecca Patlewicz, Grace Ring, Caroline Rider, Cynthia V. Ward-Caviness, Cavin Reif, David M. Jaspers, Ilona Fry, Rebecca C. Rager, Julia E. Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research |
title | Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research |
title_full | Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research |
title_fullStr | Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research |
title_full_unstemmed | Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research |
title_short | Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research |
title_sort | development of the intelligence and machine learning (tame) toolkit for introductory data science, chemical-biological analyses, predictive modeling, and database mining for environmental health research |
topic | Toxicology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9257219/ https://www.ncbi.nlm.nih.gov/pubmed/35812168 http://dx.doi.org/10.3389/ftox.2022.893924 |
work_keys_str_mv | AT roellkyle developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT kovallaurene developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT boylesrebecca developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT patlewiczgrace developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT ringcaroline developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT ridercynthiav developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT wardcavinesscavin developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT reifdavidm developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT jaspersilona developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT fryrebeccac developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch AT ragerjuliae developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch |