Cargando…

Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space

<!--HTML--><p>WhiteArea lectures&#39; twiki <a href="https://twiki.cern.ch/twiki/bin/view/LCG/WhiteAreas">HERE</a></p> <p>How can we document detailed data about all the world&#39;s language in a consistent, unified source, in a way that can ser...

Descripción completa

Detalles Bibliográficos
Autor principal: Dr. Benjamin, Martin
Lenguaje:eng
Publicado: 2015
Materias:
Acceso en línea:http://cds.cern.ch/record/2054123
_version_ 1780948239061614592
author Dr. Benjamin, Martin
author_facet Dr. Benjamin, Martin
author_sort Dr. Benjamin, Martin
collection CERN
description <!--HTML--><p>WhiteArea lectures&#39; twiki <a href="https://twiki.cern.ch/twiki/bin/view/LCG/WhiteAreas">HERE</a></p> <p>How can we document detailed data about all the world&#39;s language in a consistent, unified source, in a way that can serve knowledge and technology needs for people and their machines around the globe? Dictionaries have historically presented selective information about words and their meanings within a language, or translation equivalents between languages, in idiosyncratic, incommensurable formats with little basis in data science. The Kamusi Project introduces a new approach, conceiving of language as a matrix of interrelated data elements. By documenting these elements within each language, and linking elements at conceptual and functional nodes across languages, Kamusi aims toward an elusive Big Data goal: &quot;every word in every language.&quot; If successful, the results will run the gamut from preserving the human heritage embedded in endangered languages, to providing international vocabularies for students to succeed in science, to a Star Trek-like universal translator embedded in your smart watch. In this talk, the project&#39;s founder discusses the nefarious complexities working against the creation of a universal language data platform, and the systems Kamusi has designed to collect, codify, and deploy quantum-level linguistic data within one massive global dictionary.<br /> <br /> Bio: Martin Benjamin is the founder and director of the Kamusi Project (<a href="http://kamusi.org">http://kamusi.org</a>), an international non-profit dedicated to producing dictionary and learning resources for languages worldwide. Now resident in Lausanne, he was born and raised in the United States. His PhD in Anthropology, from Yale University, examined international aid in rural Tanzania. He is a senior scientist at the Distributed Information Systems Laboratory (LSIR) at EPFL, where he is developing methods to assemble reliable data across languages.</p>
id cern-2054123
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2015
record_format invenio
spelling cern-20541232022-11-02T22:35:17Zhttp://cds.cern.ch/record/2054123engDr. Benjamin, MartinMartin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and spaceMartin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and spaceWhite Area Meetings<!--HTML--><p>WhiteArea lectures&#39; twiki <a href="https://twiki.cern.ch/twiki/bin/view/LCG/WhiteAreas">HERE</a></p> <p>How can we document detailed data about all the world&#39;s language in a consistent, unified source, in a way that can serve knowledge and technology needs for people and their machines around the globe? Dictionaries have historically presented selective information about words and their meanings within a language, or translation equivalents between languages, in idiosyncratic, incommensurable formats with little basis in data science. The Kamusi Project introduces a new approach, conceiving of language as a matrix of interrelated data elements. By documenting these elements within each language, and linking elements at conceptual and functional nodes across languages, Kamusi aims toward an elusive Big Data goal: &quot;every word in every language.&quot; If successful, the results will run the gamut from preserving the human heritage embedded in endangered languages, to providing international vocabularies for students to succeed in science, to a Star Trek-like universal translator embedded in your smart watch. In this talk, the project&#39;s founder discusses the nefarious complexities working against the creation of a universal language data platform, and the systems Kamusi has designed to collect, codify, and deploy quantum-level linguistic data within one massive global dictionary.<br /> <br /> Bio: Martin Benjamin is the founder and director of the Kamusi Project (<a href="http://kamusi.org">http://kamusi.org</a>), an international non-profit dedicated to producing dictionary and learning resources for languages worldwide. Now resident in Lausanne, he was born and raised in the United States. His PhD in Anthropology, from Yale University, examined international aid in rural Tanzania. He is a senior scientist at the Distributed Information Systems Laboratory (LSIR) at EPFL, where he is developing methods to assemble reliable data across languages.</p> oai:cds.cern.ch:20541232015
spellingShingle White Area Meetings
Dr. Benjamin, Martin
Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space
title Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space
title_full Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space
title_fullStr Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space
title_full_unstemmed Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space
title_short Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space
title_sort martin benjamin (epfl), the particles of language: "the dictionary" as elemental data for 7000 languages across time and space
topic White Area Meetings
url http://cds.cern.ch/record/2054123
work_keys_str_mv AT drbenjaminmartin martinbenjaminepfltheparticlesoflanguagethedictionaryaselementaldatafor7000languagesacrosstimeandspace