Cargando…

Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence

The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency i...

Descripción completa

Detalles Bibliográficos
Autores principales: Tewari, Shrankhala, Toledo Margalef, Pablo, Kareem, Ayesha, Abdul-Hussein, Ayah, White, Marina, Wazana, Ashley, Davidge, Sandra T., Delrieux, Claudio, Connor, Kristin L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8621659/
https://www.ncbi.nlm.nih.gov/pubmed/34834416
http://dx.doi.org/10.3390/jpm11111064
_version_ 1784605508534009856
author Tewari, Shrankhala
Toledo Margalef, Pablo
Kareem, Ayesha
Abdul-Hussein, Ayah
White, Marina
Wazana, Ashley
Davidge, Sandra T.
Delrieux, Claudio
Connor, Kristin L.
author_facet Tewari, Shrankhala
Toledo Margalef, Pablo
Kareem, Ayesha
Abdul-Hussein, Ayah
White, Marina
Wazana, Ashley
Davidge, Sandra T.
Delrieux, Claudio
Connor, Kristin L.
author_sort Tewari, Shrankhala
collection PubMed
description The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health.
format Online
Article
Text
id pubmed-8621659
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-86216592021-11-27 Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence Tewari, Shrankhala Toledo Margalef, Pablo Kareem, Ayesha Abdul-Hussein, Ayah White, Marina Wazana, Ashley Davidge, Sandra T. Delrieux, Claudio Connor, Kristin L. J Pers Med Article The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health. MDPI 2021-10-22 /pmc/articles/PMC8621659/ /pubmed/34834416 http://dx.doi.org/10.3390/jpm11111064 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Tewari, Shrankhala
Toledo Margalef, Pablo
Kareem, Ayesha
Abdul-Hussein, Ayah
White, Marina
Wazana, Ashley
Davidge, Sandra T.
Delrieux, Claudio
Connor, Kristin L.
Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title_full Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title_fullStr Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title_full_unstemmed Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title_short Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title_sort mining early life risk and resiliency factors and their influences in human populations from pubmed: a machine learning approach to discover dohad evidence
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8621659/
https://www.ncbi.nlm.nih.gov/pubmed/34834416
http://dx.doi.org/10.3390/jpm11111064
work_keys_str_mv AT tewarishrankhala miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT toledomargalefpablo miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT kareemayesha miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT abdulhusseinayah miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT whitemarina miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT wazanaashley miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT davidgesandrat miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT delrieuxclaudio miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence
AT connorkristinl miningearlyliferiskandresiliencyfactorsandtheirinfluencesinhumanpopulationsfrompubmedamachinelearningapproachtodiscoverdohadevidence