Cargando…

Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space

<p>WhiteArea lectures' twiki <a href="https://twiki.cern.ch/twiki/bin/view/LCG/WhiteAreas">HERE</a></p> <p>How can we document detailed data about all the world's language in a consistent, unified source, in a way that can ser...

Descripción completa

Detalles Bibliográficos
Autor principal:	Dr. Benjamin, Martin
Lenguaje:	eng
Publicado:	2015
Materias:	White Area Meetings
Acceso en línea:	http://cds.cern.ch/record/2054123

_version_	1780948239061614592
author	Dr. Benjamin, Martin
author_facet	Dr. Benjamin, Martin
author_sort	Dr. Benjamin, Martin
collection	CERN
description	<!--HTML--><p>WhiteArea lectures' twiki <a href="https://twiki.cern.ch/twiki/bin/view/LCG/WhiteAreas">HERE</a></p> <p>How can we document detailed data about all the world's language in a consistent, unified source, in a way that can serve knowledge and technology needs for people and their machines around the globe? Dictionaries have historically presented selective information about words and their meanings within a language, or translation equivalents between languages, in idiosyncratic, incommensurable formats with little basis in data science. The Kamusi Project introduces a new approach, conceiving of language as a matrix of interrelated data elements. By documenting these elements within each language, and linking elements at conceptual and functional nodes across languages, Kamusi aims toward an elusive Big Data goal: "every word in every language." If successful, the results will run the gamut from preserving the human heritage embedded in endangered languages, to providing international vocabularies for students to succeed in science, to a Star Trek-like universal translator embedded in your smart watch. In this talk, the project's founder discusses the nefarious complexities working against the creation of a universal language data platform, and the systems Kamusi has designed to collect, codify, and deploy quantum-level linguistic data within one massive global dictionary.<br /> <br /> Bio: Martin Benjamin is the founder and director of the Kamusi Project (<a href="http://kamusi.org">http://kamusi.org</a>), an international non-profit dedicated to producing dictionary and learning resources for languages worldwide. Now resident in Lausanne, he was born and raised in the United States. His PhD in Anthropology, from Yale University, examined international aid in rural Tanzania. He is a senior scientist at the Distributed Information Systems Laboratory (LSIR) at EPFL, where he is developing methods to assemble reliable data across languages.</p>
id	cern-2054123
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2015
record_format	invenio
spelling	cern-20541232022-11-02T22:35:17Zhttp://cds.cern.ch/record/2054123engDr. Benjamin, MartinMartin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and spaceMartin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and spaceWhite Area Meetings<!--HTML--><p>WhiteArea lectures' twiki <a href="https://twiki.cern.ch/twiki/bin/view/LCG/WhiteAreas">HERE</a></p> <p>How can we document detailed data about all the world's language in a consistent, unified source, in a way that can serve knowledge and technology needs for people and their machines around the globe? Dictionaries have historically presented selective information about words and their meanings within a language, or translation equivalents between languages, in idiosyncratic, incommensurable formats with little basis in data science. The Kamusi Project introduces a new approach, conceiving of language as a matrix of interrelated data elements. By documenting these elements within each language, and linking elements at conceptual and functional nodes across languages, Kamusi aims toward an elusive Big Data goal: "every word in every language." If successful, the results will run the gamut from preserving the human heritage embedded in endangered languages, to providing international vocabularies for students to succeed in science, to a Star Trek-like universal translator embedded in your smart watch. In this talk, the project's founder discusses the nefarious complexities working against the creation of a universal language data platform, and the systems Kamusi has designed to collect, codify, and deploy quantum-level linguistic data within one massive global dictionary.<br /> <br /> Bio: Martin Benjamin is the founder and director of the Kamusi Project (<a href="http://kamusi.org">http://kamusi.org</a>), an international non-profit dedicated to producing dictionary and learning resources for languages worldwide. Now resident in Lausanne, he was born and raised in the United States. His PhD in Anthropology, from Yale University, examined international aid in rural Tanzania. He is a senior scientist at the Distributed Information Systems Laboratory (LSIR) at EPFL, where he is developing methods to assemble reliable data across languages.</p> oai:cds.cern.ch:20541232015
spellingShingle	White Area Meetings Dr. Benjamin, Martin Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space
title	Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space
title_full	Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space
title_fullStr	Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space
title_full_unstemmed	Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space
title_short	Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space
title_sort	martin benjamin (epfl), the particles of language: "the dictionary" as elemental data for 7000 languages across time and space
topic	White Area Meetings
url	http://cds.cern.ch/record/2054123
work_keys_str_mv	AT drbenjaminmartin martinbenjaminepfltheparticlesoflanguagethedictionaryaselementaldatafor7000languagesacrosstimeandspace

Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space

Ejemplares similares