Cargando…

Synthesizing theories of human language with Bayesian program induction

Automated, data-driven construction and evaluation of scientific models and theories is a long-standing challenge in artificial intelligence. We present a framework for algorithmically synthesizing models of a basic part of human language: morpho-phonology, the system that builds word forms from sou...

Descripción completa

Detalles Bibliográficos
Autores principales: Ellis, Kevin, Albright, Adam, Solar-Lezama, Armando, Tenenbaum, Joshua B., O’Donnell, Timothy J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9427767/
https://www.ncbi.nlm.nih.gov/pubmed/36042196
http://dx.doi.org/10.1038/s41467-022-32012-w
_version_ 1784778968390434816
author Ellis, Kevin
Albright, Adam
Solar-Lezama, Armando
Tenenbaum, Joshua B.
O’Donnell, Timothy J.
author_facet Ellis, Kevin
Albright, Adam
Solar-Lezama, Armando
Tenenbaum, Joshua B.
O’Donnell, Timothy J.
author_sort Ellis, Kevin
collection PubMed
description Automated, data-driven construction and evaluation of scientific models and theories is a long-standing challenge in artificial intelligence. We present a framework for algorithmically synthesizing models of a basic part of human language: morpho-phonology, the system that builds word forms from sounds. We integrate Bayesian inference with program synthesis and representations inspired by linguistic theory and cognitive models of learning and discovery. Across 70 datasets from 58 diverse languages, our system synthesizes human-interpretable models for core aspects of each language’s morpho-phonology, sometimes approaching models posited by human linguists. Joint inference across all 70 data sets automatically synthesizes a meta-model encoding interpretable cross-language typological tendencies. Finally, the same algorithm captures few-shot learning dynamics, acquiring new morphophonological rules from just one or a few examples. These results suggest routes to more powerful machine-enabled discovery of interpretable models in linguistics and other scientific domains.
format Online
Article
Text
id pubmed-9427767
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-94277672022-09-01 Synthesizing theories of human language with Bayesian program induction Ellis, Kevin Albright, Adam Solar-Lezama, Armando Tenenbaum, Joshua B. O’Donnell, Timothy J. Nat Commun Article Automated, data-driven construction and evaluation of scientific models and theories is a long-standing challenge in artificial intelligence. We present a framework for algorithmically synthesizing models of a basic part of human language: morpho-phonology, the system that builds word forms from sounds. We integrate Bayesian inference with program synthesis and representations inspired by linguistic theory and cognitive models of learning and discovery. Across 70 datasets from 58 diverse languages, our system synthesizes human-interpretable models for core aspects of each language’s morpho-phonology, sometimes approaching models posited by human linguists. Joint inference across all 70 data sets automatically synthesizes a meta-model encoding interpretable cross-language typological tendencies. Finally, the same algorithm captures few-shot learning dynamics, acquiring new morphophonological rules from just one or a few examples. These results suggest routes to more powerful machine-enabled discovery of interpretable models in linguistics and other scientific domains. Nature Publishing Group UK 2022-08-30 /pmc/articles/PMC9427767/ /pubmed/36042196 http://dx.doi.org/10.1038/s41467-022-32012-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Ellis, Kevin
Albright, Adam
Solar-Lezama, Armando
Tenenbaum, Joshua B.
O’Donnell, Timothy J.
Synthesizing theories of human language with Bayesian program induction
title Synthesizing theories of human language with Bayesian program induction
title_full Synthesizing theories of human language with Bayesian program induction
title_fullStr Synthesizing theories of human language with Bayesian program induction
title_full_unstemmed Synthesizing theories of human language with Bayesian program induction
title_short Synthesizing theories of human language with Bayesian program induction
title_sort synthesizing theories of human language with bayesian program induction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9427767/
https://www.ncbi.nlm.nih.gov/pubmed/36042196
http://dx.doi.org/10.1038/s41467-022-32012-w
work_keys_str_mv AT elliskevin synthesizingtheoriesofhumanlanguagewithbayesianprograminduction
AT albrightadam synthesizingtheoriesofhumanlanguagewithbayesianprograminduction
AT solarlezamaarmando synthesizingtheoriesofhumanlanguagewithbayesianprograminduction
AT tenenbaumjoshuab synthesizingtheoriesofhumanlanguagewithbayesianprograminduction
AT odonnelltimothyj synthesizingtheoriesofhumanlanguagewithbayesianprograminduction