Cargando…
Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts
BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and causes serious health complications in individuals with and without type 2 diabetes (T2D). Early diagnosis of NAFLD is important, as this can help prevent irreversible damage to the liver and, ultimately, hepatocellular ca...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304567/ https://www.ncbi.nlm.nih.gov/pubmed/32559194 http://dx.doi.org/10.1371/journal.pmed.1003149 |
_version_ | 1783548280103239680 |
---|---|
author | Atabaki-Pasdar, Naeimeh Ohlsson, Mattias Viñuela, Ana Frau, Francesca Pomares-Millan, Hugo Haid, Mark Jones, Angus G. Thomas, E. Louise Koivula, Robert W. Kurbasic, Azra Mutie, Pascal M. Fitipaldi, Hugo Fernandez, Juan Dawed, Adem Y. Giordano, Giuseppe N. Forgie, Ian M. McDonald, Timothy J. Rutters, Femke Cederberg, Henna Chabanova, Elizaveta Dale, Matilda Masi, Federico De Thomas, Cecilia Engel Allin, Kristine H. Hansen, Tue H. Heggie, Alison Hong, Mun-Gwan Elders, Petra J. M. Kennedy, Gwen Kokkola, Tarja Pedersen, Helle Krogh Mahajan, Anubha McEvoy, Donna Pattou, Francois Raverdy, Violeta Häussler, Ragna S. Sharma, Sapna Thomsen, Henrik S. Vangipurapu, Jagadish Vestergaard, Henrik ‘t Hart, Leen M. Adamski, Jerzy Musholt, Petra B. Brage, Soren Brunak, Søren Dermitzakis, Emmanouil Frost, Gary Hansen, Torben Laakso, Markku Pedersen, Oluf Ridderstråle, Martin Ruetten, Hartmut Hattersley, Andrew T. Walker, Mark Beulens, Joline W. J. Mari, Andrea Schwenk, Jochen M. Gupta, Ramneek McCarthy, Mark I. Pearson, Ewan R. Bell, Jimmy D. Pavo, Imre Franks, Paul W. |
author_facet | Atabaki-Pasdar, Naeimeh Ohlsson, Mattias Viñuela, Ana Frau, Francesca Pomares-Millan, Hugo Haid, Mark Jones, Angus G. Thomas, E. Louise Koivula, Robert W. Kurbasic, Azra Mutie, Pascal M. Fitipaldi, Hugo Fernandez, Juan Dawed, Adem Y. Giordano, Giuseppe N. Forgie, Ian M. McDonald, Timothy J. Rutters, Femke Cederberg, Henna Chabanova, Elizaveta Dale, Matilda Masi, Federico De Thomas, Cecilia Engel Allin, Kristine H. Hansen, Tue H. Heggie, Alison Hong, Mun-Gwan Elders, Petra J. M. Kennedy, Gwen Kokkola, Tarja Pedersen, Helle Krogh Mahajan, Anubha McEvoy, Donna Pattou, Francois Raverdy, Violeta Häussler, Ragna S. Sharma, Sapna Thomsen, Henrik S. Vangipurapu, Jagadish Vestergaard, Henrik ‘t Hart, Leen M. Adamski, Jerzy Musholt, Petra B. Brage, Soren Brunak, Søren Dermitzakis, Emmanouil Frost, Gary Hansen, Torben Laakso, Markku Pedersen, Oluf Ridderstråle, Martin Ruetten, Hartmut Hattersley, Andrew T. Walker, Mark Beulens, Joline W. J. Mari, Andrea Schwenk, Jochen M. Gupta, Ramneek McCarthy, Mark I. Pearson, Ewan R. Bell, Jimmy D. Pavo, Imre Franks, Paul W. |
author_sort | Atabaki-Pasdar, Naeimeh |
collection | PubMed |
description | BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and causes serious health complications in individuals with and without type 2 diabetes (T2D). Early diagnosis of NAFLD is important, as this can help prevent irreversible damage to the liver and, ultimately, hepatocellular carcinomas. We sought to expand etiological understanding and develop a diagnostic tool for NAFLD using machine learning. METHODS AND FINDINGS: We utilized the baseline data from IMI DIRECT, a multicenter prospective cohort study of 3,029 European-ancestry adults recently diagnosed with T2D (n = 795) or at high risk of developing the disease (n = 2,234). Multi-omics (genetic, transcriptomic, proteomic, and metabolomic) and clinical (liver enzymes and other serological biomarkers, anthropometry, measures of beta-cell function, insulin sensitivity, and lifestyle) data comprised the key input variables. The models were trained on MRI-image-derived liver fat content (<5% or ≥5%) available for 1,514 participants. We applied LASSO (least absolute shrinkage and selection operator) to select features from the different layers of omics data and random forest analysis to develop the models. The prediction models included clinical and omics variables separately or in combination. A model including all omics and clinical variables yielded a cross-validated receiver operating characteristic area under the curve (ROCAUC) of 0.84 (95% CI 0.82, 0.86; p < 0.001), which compared with a ROCAUC of 0.82 (95% CI 0.81, 0.83; p < 0.001) for a model including 9 clinically accessible variables. The IMI DIRECT prediction models outperformed existing noninvasive NAFLD prediction tools. One limitation is that these analyses were performed in adults of European ancestry residing in northern Europe, and it is unknown how well these findings will translate to people of other ancestries and exposed to environmental risk factors that differ from those of the present cohort. Another key limitation of this study is that the prediction was done on a binary outcome of liver fat quantity (<5% or ≥5%) rather than a continuous one. CONCLUSIONS: In this study, we developed several models with different combinations of clinical and omics data and identified biological features that appear to be associated with liver fat accumulation. In general, the clinical variables showed better prediction ability than the complex omics variables. However, the combination of omics and clinical variables yielded the highest accuracy. We have incorporated the developed clinical models into a web interface (see: https://www.predictliverfat.org/) and made it available to the community. TRIAL REGISTRATION: ClinicalTrials.gov NCT03814915. |
format | Online Article Text |
id | pubmed-7304567 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-73045672020-06-19 Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts Atabaki-Pasdar, Naeimeh Ohlsson, Mattias Viñuela, Ana Frau, Francesca Pomares-Millan, Hugo Haid, Mark Jones, Angus G. Thomas, E. Louise Koivula, Robert W. Kurbasic, Azra Mutie, Pascal M. Fitipaldi, Hugo Fernandez, Juan Dawed, Adem Y. Giordano, Giuseppe N. Forgie, Ian M. McDonald, Timothy J. Rutters, Femke Cederberg, Henna Chabanova, Elizaveta Dale, Matilda Masi, Federico De Thomas, Cecilia Engel Allin, Kristine H. Hansen, Tue H. Heggie, Alison Hong, Mun-Gwan Elders, Petra J. M. Kennedy, Gwen Kokkola, Tarja Pedersen, Helle Krogh Mahajan, Anubha McEvoy, Donna Pattou, Francois Raverdy, Violeta Häussler, Ragna S. Sharma, Sapna Thomsen, Henrik S. Vangipurapu, Jagadish Vestergaard, Henrik ‘t Hart, Leen M. Adamski, Jerzy Musholt, Petra B. Brage, Soren Brunak, Søren Dermitzakis, Emmanouil Frost, Gary Hansen, Torben Laakso, Markku Pedersen, Oluf Ridderstråle, Martin Ruetten, Hartmut Hattersley, Andrew T. Walker, Mark Beulens, Joline W. J. Mari, Andrea Schwenk, Jochen M. Gupta, Ramneek McCarthy, Mark I. Pearson, Ewan R. Bell, Jimmy D. Pavo, Imre Franks, Paul W. PLoS Med Research Article BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and causes serious health complications in individuals with and without type 2 diabetes (T2D). Early diagnosis of NAFLD is important, as this can help prevent irreversible damage to the liver and, ultimately, hepatocellular carcinomas. We sought to expand etiological understanding and develop a diagnostic tool for NAFLD using machine learning. METHODS AND FINDINGS: We utilized the baseline data from IMI DIRECT, a multicenter prospective cohort study of 3,029 European-ancestry adults recently diagnosed with T2D (n = 795) or at high risk of developing the disease (n = 2,234). Multi-omics (genetic, transcriptomic, proteomic, and metabolomic) and clinical (liver enzymes and other serological biomarkers, anthropometry, measures of beta-cell function, insulin sensitivity, and lifestyle) data comprised the key input variables. The models were trained on MRI-image-derived liver fat content (<5% or ≥5%) available for 1,514 participants. We applied LASSO (least absolute shrinkage and selection operator) to select features from the different layers of omics data and random forest analysis to develop the models. The prediction models included clinical and omics variables separately or in combination. A model including all omics and clinical variables yielded a cross-validated receiver operating characteristic area under the curve (ROCAUC) of 0.84 (95% CI 0.82, 0.86; p < 0.001), which compared with a ROCAUC of 0.82 (95% CI 0.81, 0.83; p < 0.001) for a model including 9 clinically accessible variables. The IMI DIRECT prediction models outperformed existing noninvasive NAFLD prediction tools. One limitation is that these analyses were performed in adults of European ancestry residing in northern Europe, and it is unknown how well these findings will translate to people of other ancestries and exposed to environmental risk factors that differ from those of the present cohort. Another key limitation of this study is that the prediction was done on a binary outcome of liver fat quantity (<5% or ≥5%) rather than a continuous one. CONCLUSIONS: In this study, we developed several models with different combinations of clinical and omics data and identified biological features that appear to be associated with liver fat accumulation. In general, the clinical variables showed better prediction ability than the complex omics variables. However, the combination of omics and clinical variables yielded the highest accuracy. We have incorporated the developed clinical models into a web interface (see: https://www.predictliverfat.org/) and made it available to the community. TRIAL REGISTRATION: ClinicalTrials.gov NCT03814915. Public Library of Science 2020-06-19 /pmc/articles/PMC7304567/ /pubmed/32559194 http://dx.doi.org/10.1371/journal.pmed.1003149 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication. |
spellingShingle | Research Article Atabaki-Pasdar, Naeimeh Ohlsson, Mattias Viñuela, Ana Frau, Francesca Pomares-Millan, Hugo Haid, Mark Jones, Angus G. Thomas, E. Louise Koivula, Robert W. Kurbasic, Azra Mutie, Pascal M. Fitipaldi, Hugo Fernandez, Juan Dawed, Adem Y. Giordano, Giuseppe N. Forgie, Ian M. McDonald, Timothy J. Rutters, Femke Cederberg, Henna Chabanova, Elizaveta Dale, Matilda Masi, Federico De Thomas, Cecilia Engel Allin, Kristine H. Hansen, Tue H. Heggie, Alison Hong, Mun-Gwan Elders, Petra J. M. Kennedy, Gwen Kokkola, Tarja Pedersen, Helle Krogh Mahajan, Anubha McEvoy, Donna Pattou, Francois Raverdy, Violeta Häussler, Ragna S. Sharma, Sapna Thomsen, Henrik S. Vangipurapu, Jagadish Vestergaard, Henrik ‘t Hart, Leen M. Adamski, Jerzy Musholt, Petra B. Brage, Soren Brunak, Søren Dermitzakis, Emmanouil Frost, Gary Hansen, Torben Laakso, Markku Pedersen, Oluf Ridderstråle, Martin Ruetten, Hartmut Hattersley, Andrew T. Walker, Mark Beulens, Joline W. J. Mari, Andrea Schwenk, Jochen M. Gupta, Ramneek McCarthy, Mark I. Pearson, Ewan R. Bell, Jimmy D. Pavo, Imre Franks, Paul W. Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts |
title | Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts |
title_full | Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts |
title_fullStr | Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts |
title_full_unstemmed | Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts |
title_short | Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts |
title_sort | predicting and elucidating the etiology of fatty liver disease: a machine learning modeling and validation study in the imi direct cohorts |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304567/ https://www.ncbi.nlm.nih.gov/pubmed/32559194 http://dx.doi.org/10.1371/journal.pmed.1003149 |
work_keys_str_mv | AT atabakipasdarnaeimeh predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT ohlssonmattias predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT vinuelaana predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT fraufrancesca predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT pomaresmillanhugo predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT haidmark predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT jonesangusg predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT thomaselouise predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT koivularobertw predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT kurbasicazra predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT mutiepascalm predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT fitipaldihugo predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT fernandezjuan predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT dawedademy predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT giordanogiuseppen predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT forgieianm predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT mcdonaldtimothyj predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT ruttersfemke predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT cederberghenna predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT chabanovaelizaveta predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT dalematilda predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT masifedericode predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT thomasceciliaengel predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT allinkristineh predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT hansentueh predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT heggiealison predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT hongmungwan predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT elderspetrajm predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT kennedygwen predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT kokkolatarja predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT pedersenhellekrogh predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT mahajananubha predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT mcevoydonna predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT pattoufrancois predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT raverdyvioleta predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT hausslerragnas predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT sharmasapna predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT thomsenhenriks predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT vangipurapujagadish predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT vestergaardhenrik predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT thartleenm predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT adamskijerzy predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT musholtpetrab predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT bragesoren predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT brunaksøren predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT dermitzakisemmanouil predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT frostgary predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT hansentorben predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT laaksomarkku predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT pedersenoluf predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT ridderstralemartin predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT ruettenhartmut predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT hattersleyandrewt predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT walkermark predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT beulensjolinewj predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT mariandrea predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT schwenkjochenm predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT guptaramneek predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT mccarthymarki predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT pearsonewanr predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT belljimmyd predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT pavoimre predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts AT frankspaulw predictingandelucidatingtheetiologyoffattyliverdiseaseamachinelearningmodelingandvalidationstudyintheimidirectcohorts |