Cargando…

Natural language inference for Malayalam language using language agnostic sentence representation

Natural language inference (NLI) is an essential subtask in many natural language processing applications. It is a directional relationship from premise to hypothesis. A pair of texts is defined as entailed if a text infers its meaning from the other text. The NLI is also known as textual entailment...

Descripción completa

Detalles Bibliográficos
Autores principales:	Renjit, Sara, Idicula, Sumam
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2021
Materias:	Computational Linguistics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8114806/ https://www.ncbi.nlm.nih.gov/pubmed/34013034 http://dx.doi.org/10.7717/peerj-cs.508

_version_	1783691121246863360
author	Renjit, Sara Idicula, Sumam
author_facet	Renjit, Sara Idicula, Sumam
author_sort	Renjit, Sara
collection	PubMed
description	Natural language inference (NLI) is an essential subtask in many natural language processing applications. It is a directional relationship from premise to hypothesis. A pair of texts is defined as entailed if a text infers its meaning from the other text. The NLI is also known as textual entailment recognition, and it recognizes entailed and contradictory sentences in various NLP systems like Question Answering, Summarization and Information retrieval systems. This paper describes the NLI problem attempted for a low resource Indian language Malayalam, the regional language of Kerala. More than 30 million people speak this language. The paper is about the Malayalam NLI dataset, named MaNLI dataset, and its application of NLI in Malayalam language using different models, namely Doc2Vec (paragraph vector), fastText, BERT (Bidirectional Encoder Representation from Transformers), and LASER (Language Agnostic Sentence Representation). Our work attempts NLI in two ways, as binary classification and as multiclass classification. For both the classifications, LASER outperformed the other techniques. For multiclass classification, NLI using LASER based sentence embedding technique outperformed the other techniques by a significant margin of 12% accuracy. There was also an accuracy improvement of 9% for LASER based NLI system for binary classification over the other techniques.
format	Online Article Text
id	pubmed-8114806
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-81148062021-05-18 Natural language inference for Malayalam language using language agnostic sentence representation Renjit, Sara Idicula, Sumam PeerJ Comput Sci Computational Linguistics Natural language inference (NLI) is an essential subtask in many natural language processing applications. It is a directional relationship from premise to hypothesis. A pair of texts is defined as entailed if a text infers its meaning from the other text. The NLI is also known as textual entailment recognition, and it recognizes entailed and contradictory sentences in various NLP systems like Question Answering, Summarization and Information retrieval systems. This paper describes the NLI problem attempted for a low resource Indian language Malayalam, the regional language of Kerala. More than 30 million people speak this language. The paper is about the Malayalam NLI dataset, named MaNLI dataset, and its application of NLI in Malayalam language using different models, namely Doc2Vec (paragraph vector), fastText, BERT (Bidirectional Encoder Representation from Transformers), and LASER (Language Agnostic Sentence Representation). Our work attempts NLI in two ways, as binary classification and as multiclass classification. For both the classifications, LASER outperformed the other techniques. For multiclass classification, NLI using LASER based sentence embedding technique outperformed the other techniques by a significant margin of 12% accuracy. There was also an accuracy improvement of 9% for LASER based NLI system for binary classification over the other techniques. PeerJ Inc. 2021-05-04 /pmc/articles/PMC8114806/ /pubmed/34013034 http://dx.doi.org/10.7717/peerj-cs.508 Text en © 2021 Renjit and Idicula https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Computational Linguistics Renjit, Sara Idicula, Sumam Natural language inference for Malayalam language using language agnostic sentence representation
title	Natural language inference for Malayalam language using language agnostic sentence representation
title_full	Natural language inference for Malayalam language using language agnostic sentence representation
title_fullStr	Natural language inference for Malayalam language using language agnostic sentence representation
title_full_unstemmed	Natural language inference for Malayalam language using language agnostic sentence representation
title_short	Natural language inference for Malayalam language using language agnostic sentence representation
title_sort	natural language inference for malayalam language using language agnostic sentence representation
topic	Computational Linguistics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8114806/ https://www.ncbi.nlm.nih.gov/pubmed/34013034 http://dx.doi.org/10.7717/peerj-cs.508
work_keys_str_mv	AT renjitsara naturallanguageinferenceformalayalamlanguageusinglanguageagnosticsentencerepresentation AT idiculasumam naturallanguageinferenceformalayalamlanguageusinglanguageagnosticsentencerepresentation

Natural language inference for Malayalam language using language agnostic sentence representation

Ejemplares similares