Cargando…

Natural language inference for Malayalam language using language agnostic sentence representation

Natural language inference (NLI) is an essential subtask in many natural language processing applications. It is a directional relationship from premise to hypothesis. A pair of texts is defined as entailed if a text infers its meaning from the other text. The NLI is also known as textual entailment...

Descripción completa

Detalles Bibliográficos
Autores principales: Renjit, Sara, Idicula, Sumam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8114806/
https://www.ncbi.nlm.nih.gov/pubmed/34013034
http://dx.doi.org/10.7717/peerj-cs.508
_version_ 1783691121246863360
author Renjit, Sara
Idicula, Sumam
author_facet Renjit, Sara
Idicula, Sumam
author_sort Renjit, Sara
collection PubMed
description Natural language inference (NLI) is an essential subtask in many natural language processing applications. It is a directional relationship from premise to hypothesis. A pair of texts is defined as entailed if a text infers its meaning from the other text. The NLI is also known as textual entailment recognition, and it recognizes entailed and contradictory sentences in various NLP systems like Question Answering, Summarization and Information retrieval systems. This paper describes the NLI problem attempted for a low resource Indian language Malayalam, the regional language of Kerala. More than 30 million people speak this language. The paper is about the Malayalam NLI dataset, named MaNLI dataset, and its application of NLI in Malayalam language using different models, namely Doc2Vec (paragraph vector), fastText, BERT (Bidirectional Encoder Representation from Transformers), and LASER (Language Agnostic Sentence Representation). Our work attempts NLI in two ways, as binary classification and as multiclass classification. For both the classifications, LASER outperformed the other techniques. For multiclass classification, NLI using LASER based sentence embedding technique outperformed the other techniques by a significant margin of 12% accuracy. There was also an accuracy improvement of 9% for LASER based NLI system for binary classification over the other techniques.
format Online
Article
Text
id pubmed-8114806
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-81148062021-05-18 Natural language inference for Malayalam language using language agnostic sentence representation Renjit, Sara Idicula, Sumam PeerJ Comput Sci Computational Linguistics Natural language inference (NLI) is an essential subtask in many natural language processing applications. It is a directional relationship from premise to hypothesis. A pair of texts is defined as entailed if a text infers its meaning from the other text. The NLI is also known as textual entailment recognition, and it recognizes entailed and contradictory sentences in various NLP systems like Question Answering, Summarization and Information retrieval systems. This paper describes the NLI problem attempted for a low resource Indian language Malayalam, the regional language of Kerala. More than 30 million people speak this language. The paper is about the Malayalam NLI dataset, named MaNLI dataset, and its application of NLI in Malayalam language using different models, namely Doc2Vec (paragraph vector), fastText, BERT (Bidirectional Encoder Representation from Transformers), and LASER (Language Agnostic Sentence Representation). Our work attempts NLI in two ways, as binary classification and as multiclass classification. For both the classifications, LASER outperformed the other techniques. For multiclass classification, NLI using LASER based sentence embedding technique outperformed the other techniques by a significant margin of 12% accuracy. There was also an accuracy improvement of 9% for LASER based NLI system for binary classification over the other techniques. PeerJ Inc. 2021-05-04 /pmc/articles/PMC8114806/ /pubmed/34013034 http://dx.doi.org/10.7717/peerj-cs.508 Text en © 2021 Renjit and Idicula https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Computational Linguistics
Renjit, Sara
Idicula, Sumam
Natural language inference for Malayalam language using language agnostic sentence representation
title Natural language inference for Malayalam language using language agnostic sentence representation
title_full Natural language inference for Malayalam language using language agnostic sentence representation
title_fullStr Natural language inference for Malayalam language using language agnostic sentence representation
title_full_unstemmed Natural language inference for Malayalam language using language agnostic sentence representation
title_short Natural language inference for Malayalam language using language agnostic sentence representation
title_sort natural language inference for malayalam language using language agnostic sentence representation
topic Computational Linguistics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8114806/
https://www.ncbi.nlm.nih.gov/pubmed/34013034
http://dx.doi.org/10.7717/peerj-cs.508
work_keys_str_mv AT renjitsara naturallanguageinferenceformalayalamlanguageusinglanguageagnosticsentencerepresentation
AT idiculasumam naturallanguageinferenceformalayalamlanguageusinglanguageagnosticsentencerepresentation