Cargando…

Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research

Research in environmental health is becoming increasingly reliant upon data science and computational methods that can more efficiently extract information from complex datasets. Data science and computational methods can be leveraged to better identify relationships between exposures to stressors i...

Descripción completa

Detalles Bibliográficos
Autores principales: Roell, Kyle, Koval, Lauren E., Boyles, Rebecca, Patlewicz, Grace, Ring, Caroline, Rider, Cynthia V., Ward-Caviness, Cavin, Reif, David M., Jaspers, Ilona, Fry, Rebecca C., Rager, Julia E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9257219/
https://www.ncbi.nlm.nih.gov/pubmed/35812168
http://dx.doi.org/10.3389/ftox.2022.893924
_version_ 1784741295795732480
author Roell, Kyle
Koval, Lauren E.
Boyles, Rebecca
Patlewicz, Grace
Ring, Caroline
Rider, Cynthia V.
Ward-Caviness, Cavin
Reif, David M.
Jaspers, Ilona
Fry, Rebecca C.
Rager, Julia E.
author_facet Roell, Kyle
Koval, Lauren E.
Boyles, Rebecca
Patlewicz, Grace
Ring, Caroline
Rider, Cynthia V.
Ward-Caviness, Cavin
Reif, David M.
Jaspers, Ilona
Fry, Rebecca C.
Rager, Julia E.
author_sort Roell, Kyle
collection PubMed
description Research in environmental health is becoming increasingly reliant upon data science and computational methods that can more efficiently extract information from complex datasets. Data science and computational methods can be leveraged to better identify relationships between exposures to stressors in the environment and human disease outcomes, representing critical information needed to protect and improve global public health. Still, there remains a critical gap surrounding the training of researchers on these in silico methods. We aimed to address this gap by developing the inTelligence And Machine lEarning (TAME) Toolkit, promoting trainee-driven data generation, management, and analysis methods to “TAME” data in environmental health studies. Training modules were developed to provide applications-driven examples of data organization and analysis methods that can be used to address environmental health questions. Target audiences for these modules include students, post-baccalaureate and post-doctorate trainees, and professionals that are interested in expanding their skillset to include recent advances in data analysis methods relevant to environmental health, toxicology, exposure science, epidemiology, and bioinformatics/cheminformatics. Modules were developed by study coauthors using annotated script and were organized into three chapters within a GitHub Bookdown site. The first chapter of modules focuses on introductory data science, which includes the following topics: setting up R/RStudio and coding in the R environment; data organization basics; finding and visualizing data trends; high-dimensional data visualizations; and Findability, Accessibility, Interoperability, and Reusability (FAIR) data management practices. The second chapter of modules incorporates chemical-biological analyses and predictive modeling, spanning the following methods: dose-response modeling; machine learning and predictive modeling; mixtures analyses; -omics analyses; toxicokinetic modeling; and read-across toxicity predictions. The last chapter of modules was organized to provide examples on environmental health database mining and integration, including chemical exposure, health outcome, and environmental justice indicators. Training modules and associated data are publicly available online (https://uncsrp.github.io/Data-Analysis-Training-Modules/). Together, this resource provides unique opportunities to obtain introductory-level training on current data analysis methods applicable to 21st century science and environmental health.
format Online
Article
Text
id pubmed-9257219
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-92572192022-07-07 Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research Roell, Kyle Koval, Lauren E. Boyles, Rebecca Patlewicz, Grace Ring, Caroline Rider, Cynthia V. Ward-Caviness, Cavin Reif, David M. Jaspers, Ilona Fry, Rebecca C. Rager, Julia E. Front Toxicol Toxicology Research in environmental health is becoming increasingly reliant upon data science and computational methods that can more efficiently extract information from complex datasets. Data science and computational methods can be leveraged to better identify relationships between exposures to stressors in the environment and human disease outcomes, representing critical information needed to protect and improve global public health. Still, there remains a critical gap surrounding the training of researchers on these in silico methods. We aimed to address this gap by developing the inTelligence And Machine lEarning (TAME) Toolkit, promoting trainee-driven data generation, management, and analysis methods to “TAME” data in environmental health studies. Training modules were developed to provide applications-driven examples of data organization and analysis methods that can be used to address environmental health questions. Target audiences for these modules include students, post-baccalaureate and post-doctorate trainees, and professionals that are interested in expanding their skillset to include recent advances in data analysis methods relevant to environmental health, toxicology, exposure science, epidemiology, and bioinformatics/cheminformatics. Modules were developed by study coauthors using annotated script and were organized into three chapters within a GitHub Bookdown site. The first chapter of modules focuses on introductory data science, which includes the following topics: setting up R/RStudio and coding in the R environment; data organization basics; finding and visualizing data trends; high-dimensional data visualizations; and Findability, Accessibility, Interoperability, and Reusability (FAIR) data management practices. The second chapter of modules incorporates chemical-biological analyses and predictive modeling, spanning the following methods: dose-response modeling; machine learning and predictive modeling; mixtures analyses; -omics analyses; toxicokinetic modeling; and read-across toxicity predictions. The last chapter of modules was organized to provide examples on environmental health database mining and integration, including chemical exposure, health outcome, and environmental justice indicators. Training modules and associated data are publicly available online (https://uncsrp.github.io/Data-Analysis-Training-Modules/). Together, this resource provides unique opportunities to obtain introductory-level training on current data analysis methods applicable to 21st century science and environmental health. Frontiers Media S.A. 2022-06-22 /pmc/articles/PMC9257219/ /pubmed/35812168 http://dx.doi.org/10.3389/ftox.2022.893924 Text en Copyright © 2022 Roell, Koval, Boyles, Patlewicz, Ring, Rider, Ward-Caviness, Reif, Jaspers, Fry and Rager. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Toxicology
Roell, Kyle
Koval, Lauren E.
Boyles, Rebecca
Patlewicz, Grace
Ring, Caroline
Rider, Cynthia V.
Ward-Caviness, Cavin
Reif, David M.
Jaspers, Ilona
Fry, Rebecca C.
Rager, Julia E.
Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research
title Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research
title_full Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research
title_fullStr Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research
title_full_unstemmed Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research
title_short Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research
title_sort development of the intelligence and machine learning (tame) toolkit for introductory data science, chemical-biological analyses, predictive modeling, and database mining for environmental health research
topic Toxicology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9257219/
https://www.ncbi.nlm.nih.gov/pubmed/35812168
http://dx.doi.org/10.3389/ftox.2022.893924
work_keys_str_mv AT roellkyle developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT kovallaurene developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT boylesrebecca developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT patlewiczgrace developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT ringcaroline developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT ridercynthiav developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT wardcavinesscavin developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT reifdavidm developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT jaspersilona developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT fryrebeccac developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch
AT ragerjuliae developmentoftheintelligenceandmachinelearningtametoolkitforintroductorydatasciencechemicalbiologicalanalysespredictivemodelinganddatabaseminingforenvironmentalhealthresearch