Cargando…

BowSaw: Inferring Higher-Order Trait Interactions Associated With Complex Biological Phenotypes

Machine learning is helping the interpretation of biological complexity by enabling the inference and classification of cellular, organismal and ecological phenotypes based on large datasets, e.g., from genomic, transcriptomic and metagenomic analyses. A number of available algorithms can help searc...

Descripción completa

Detalles Bibliográficos
Autores principales: DiMucci, Demetrius, Kon, Mark, Segrè, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8245782/
https://www.ncbi.nlm.nih.gov/pubmed/34222331
http://dx.doi.org/10.3389/fmolb.2021.663532
_version_ 1783716184097554432
author DiMucci, Demetrius
Kon, Mark
Segrè, Daniel
author_facet DiMucci, Demetrius
Kon, Mark
Segrè, Daniel
author_sort DiMucci, Demetrius
collection PubMed
description Machine learning is helping the interpretation of biological complexity by enabling the inference and classification of cellular, organismal and ecological phenotypes based on large datasets, e.g., from genomic, transcriptomic and metagenomic analyses. A number of available algorithms can help search these datasets to uncover patterns associated with specific traits, including disease-related attributes. While, in many instances, treating an algorithm as a black box is sufficient, it is interesting to pursue an enhanced understanding of how system variables end up contributing to a specific output, as an avenue toward new mechanistic insight. Here we address this challenge through a suite of algorithms, named BowSaw, which takes advantage of the structure of a trained random forest algorithm to identify combinations of variables (“rules”) frequently used for classification. We first apply BowSaw to a simulated dataset and show that the algorithm can accurately recover the sets of variables used to generate the phenotypes through complex Boolean rules, even under challenging noise levels. We next apply our method to data from the integrative Human Microbiome Project and find previously unreported high-order combinations of microbial taxa putatively associated with Crohn’s disease. By leveraging the structure of trees within a random forest, BowSaw provides a new way of using decision trees to generate testable biological hypotheses.
format Online
Article
Text
id pubmed-8245782
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-82457822021-07-02 BowSaw: Inferring Higher-Order Trait Interactions Associated With Complex Biological Phenotypes DiMucci, Demetrius Kon, Mark Segrè, Daniel Front Mol Biosci Molecular Biosciences Machine learning is helping the interpretation of biological complexity by enabling the inference and classification of cellular, organismal and ecological phenotypes based on large datasets, e.g., from genomic, transcriptomic and metagenomic analyses. A number of available algorithms can help search these datasets to uncover patterns associated with specific traits, including disease-related attributes. While, in many instances, treating an algorithm as a black box is sufficient, it is interesting to pursue an enhanced understanding of how system variables end up contributing to a specific output, as an avenue toward new mechanistic insight. Here we address this challenge through a suite of algorithms, named BowSaw, which takes advantage of the structure of a trained random forest algorithm to identify combinations of variables (“rules”) frequently used for classification. We first apply BowSaw to a simulated dataset and show that the algorithm can accurately recover the sets of variables used to generate the phenotypes through complex Boolean rules, even under challenging noise levels. We next apply our method to data from the integrative Human Microbiome Project and find previously unreported high-order combinations of microbial taxa putatively associated with Crohn’s disease. By leveraging the structure of trees within a random forest, BowSaw provides a new way of using decision trees to generate testable biological hypotheses. Frontiers Media S.A. 2021-06-17 /pmc/articles/PMC8245782/ /pubmed/34222331 http://dx.doi.org/10.3389/fmolb.2021.663532 Text en Copyright © 2021 DiMucci, Kon and Segrè. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Molecular Biosciences
DiMucci, Demetrius
Kon, Mark
Segrè, Daniel
BowSaw: Inferring Higher-Order Trait Interactions Associated With Complex Biological Phenotypes
title BowSaw: Inferring Higher-Order Trait Interactions Associated With Complex Biological Phenotypes
title_full BowSaw: Inferring Higher-Order Trait Interactions Associated With Complex Biological Phenotypes
title_fullStr BowSaw: Inferring Higher-Order Trait Interactions Associated With Complex Biological Phenotypes
title_full_unstemmed BowSaw: Inferring Higher-Order Trait Interactions Associated With Complex Biological Phenotypes
title_short BowSaw: Inferring Higher-Order Trait Interactions Associated With Complex Biological Phenotypes
title_sort bowsaw: inferring higher-order trait interactions associated with complex biological phenotypes
topic Molecular Biosciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8245782/
https://www.ncbi.nlm.nih.gov/pubmed/34222331
http://dx.doi.org/10.3389/fmolb.2021.663532
work_keys_str_mv AT dimuccidemetrius bowsawinferringhigherordertraitinteractionsassociatedwithcomplexbiologicalphenotypes
AT konmark bowsawinferringhigherordertraitinteractionsassociatedwithcomplexbiologicalphenotypes
AT segredaniel bowsawinferringhigherordertraitinteractionsassociatedwithcomplexbiologicalphenotypes