Cargando…
Synthesizing theories of human language with Bayesian program induction
Automated, data-driven construction and evaluation of scientific models and theories is a long-standing challenge in artificial intelligence. We present a framework for algorithmically synthesizing models of a basic part of human language: morpho-phonology, the system that builds word forms from sou...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9427767/ https://www.ncbi.nlm.nih.gov/pubmed/36042196 http://dx.doi.org/10.1038/s41467-022-32012-w |
_version_ | 1784778968390434816 |
---|---|
author | Ellis, Kevin Albright, Adam Solar-Lezama, Armando Tenenbaum, Joshua B. O’Donnell, Timothy J. |
author_facet | Ellis, Kevin Albright, Adam Solar-Lezama, Armando Tenenbaum, Joshua B. O’Donnell, Timothy J. |
author_sort | Ellis, Kevin |
collection | PubMed |
description | Automated, data-driven construction and evaluation of scientific models and theories is a long-standing challenge in artificial intelligence. We present a framework for algorithmically synthesizing models of a basic part of human language: morpho-phonology, the system that builds word forms from sounds. We integrate Bayesian inference with program synthesis and representations inspired by linguistic theory and cognitive models of learning and discovery. Across 70 datasets from 58 diverse languages, our system synthesizes human-interpretable models for core aspects of each language’s morpho-phonology, sometimes approaching models posited by human linguists. Joint inference across all 70 data sets automatically synthesizes a meta-model encoding interpretable cross-language typological tendencies. Finally, the same algorithm captures few-shot learning dynamics, acquiring new morphophonological rules from just one or a few examples. These results suggest routes to more powerful machine-enabled discovery of interpretable models in linguistics and other scientific domains. |
format | Online Article Text |
id | pubmed-9427767 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-94277672022-09-01 Synthesizing theories of human language with Bayesian program induction Ellis, Kevin Albright, Adam Solar-Lezama, Armando Tenenbaum, Joshua B. O’Donnell, Timothy J. Nat Commun Article Automated, data-driven construction and evaluation of scientific models and theories is a long-standing challenge in artificial intelligence. We present a framework for algorithmically synthesizing models of a basic part of human language: morpho-phonology, the system that builds word forms from sounds. We integrate Bayesian inference with program synthesis and representations inspired by linguistic theory and cognitive models of learning and discovery. Across 70 datasets from 58 diverse languages, our system synthesizes human-interpretable models for core aspects of each language’s morpho-phonology, sometimes approaching models posited by human linguists. Joint inference across all 70 data sets automatically synthesizes a meta-model encoding interpretable cross-language typological tendencies. Finally, the same algorithm captures few-shot learning dynamics, acquiring new morphophonological rules from just one or a few examples. These results suggest routes to more powerful machine-enabled discovery of interpretable models in linguistics and other scientific domains. Nature Publishing Group UK 2022-08-30 /pmc/articles/PMC9427767/ /pubmed/36042196 http://dx.doi.org/10.1038/s41467-022-32012-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Ellis, Kevin Albright, Adam Solar-Lezama, Armando Tenenbaum, Joshua B. O’Donnell, Timothy J. Synthesizing theories of human language with Bayesian program induction |
title | Synthesizing theories of human language with Bayesian program induction |
title_full | Synthesizing theories of human language with Bayesian program induction |
title_fullStr | Synthesizing theories of human language with Bayesian program induction |
title_full_unstemmed | Synthesizing theories of human language with Bayesian program induction |
title_short | Synthesizing theories of human language with Bayesian program induction |
title_sort | synthesizing theories of human language with bayesian program induction |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9427767/ https://www.ncbi.nlm.nih.gov/pubmed/36042196 http://dx.doi.org/10.1038/s41467-022-32012-w |
work_keys_str_mv | AT elliskevin synthesizingtheoriesofhumanlanguagewithbayesianprograminduction AT albrightadam synthesizingtheoriesofhumanlanguagewithbayesianprograminduction AT solarlezamaarmando synthesizingtheoriesofhumanlanguagewithbayesianprograminduction AT tenenbaumjoshuab synthesizingtheoriesofhumanlanguagewithbayesianprograminduction AT odonnelltimothyj synthesizingtheoriesofhumanlanguagewithbayesianprograminduction |