Cargando…

A massively parallel corpus: the Bible in 100 languages

We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Finally we present a statistical analysi...

Descripción completa

Detalles Bibliográficos
Autores principales: Christodouloupoulos, Christos, Steedman, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Netherlands 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4551210/
https://www.ncbi.nlm.nih.gov/pubmed/26321896
http://dx.doi.org/10.1007/s10579-014-9287-y
_version_ 1782387544094146560
author Christodouloupoulos, Christos
Steedman, Mark
author_facet Christodouloupoulos, Christos
Steedman, Mark
author_sort Christodouloupoulos, Christos
collection PubMed
description We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Finally we present a statistical analysis of the corpora collected and a detailed comparison between the English translation and other English corpora.
format Online
Article
Text
id pubmed-4551210
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Springer Netherlands
record_format MEDLINE/PubMed
spelling pubmed-45512102015-08-28 A massively parallel corpus: the Bible in 100 languages Christodouloupoulos, Christos Steedman, Mark Lang Resour Eval Original Paper We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Finally we present a statistical analysis of the corpora collected and a detailed comparison between the English translation and other English corpora. Springer Netherlands 2014-11-19 2015 /pmc/articles/PMC4551210/ /pubmed/26321896 http://dx.doi.org/10.1007/s10579-014-9287-y Text en © The Author(s) 2014 https://creativecommons.org/licenses/by/4.0/ Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
spellingShingle Original Paper
Christodouloupoulos, Christos
Steedman, Mark
A massively parallel corpus: the Bible in 100 languages
title A massively parallel corpus: the Bible in 100 languages
title_full A massively parallel corpus: the Bible in 100 languages
title_fullStr A massively parallel corpus: the Bible in 100 languages
title_full_unstemmed A massively parallel corpus: the Bible in 100 languages
title_short A massively parallel corpus: the Bible in 100 languages
title_sort massively parallel corpus: the bible in 100 languages
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4551210/
https://www.ncbi.nlm.nih.gov/pubmed/26321896
http://dx.doi.org/10.1007/s10579-014-9287-y
work_keys_str_mv AT christodouloupouloschristos amassivelyparallelcorpusthebiblein100languages
AT steedmanmark amassivelyparallelcorpusthebiblein100languages
AT christodouloupouloschristos massivelyparallelcorpusthebiblein100languages
AT steedmanmark massivelyparallelcorpusthebiblein100languages