Cargando…
Natural language inference for Malayalam language using language agnostic sentence representation
Natural language inference (NLI) is an essential subtask in many natural language processing applications. It is a directional relationship from premise to hypothesis. A pair of texts is defined as entailed if a text infers its meaning from the other text. The NLI is also known as textual entailment...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8114806/ https://www.ncbi.nlm.nih.gov/pubmed/34013034 http://dx.doi.org/10.7717/peerj-cs.508 |
_version_ | 1783691121246863360 |
---|---|
author | Renjit, Sara Idicula, Sumam |
author_facet | Renjit, Sara Idicula, Sumam |
author_sort | Renjit, Sara |
collection | PubMed |
description | Natural language inference (NLI) is an essential subtask in many natural language processing applications. It is a directional relationship from premise to hypothesis. A pair of texts is defined as entailed if a text infers its meaning from the other text. The NLI is also known as textual entailment recognition, and it recognizes entailed and contradictory sentences in various NLP systems like Question Answering, Summarization and Information retrieval systems. This paper describes the NLI problem attempted for a low resource Indian language Malayalam, the regional language of Kerala. More than 30 million people speak this language. The paper is about the Malayalam NLI dataset, named MaNLI dataset, and its application of NLI in Malayalam language using different models, namely Doc2Vec (paragraph vector), fastText, BERT (Bidirectional Encoder Representation from Transformers), and LASER (Language Agnostic Sentence Representation). Our work attempts NLI in two ways, as binary classification and as multiclass classification. For both the classifications, LASER outperformed the other techniques. For multiclass classification, NLI using LASER based sentence embedding technique outperformed the other techniques by a significant margin of 12% accuracy. There was also an accuracy improvement of 9% for LASER based NLI system for binary classification over the other techniques. |
format | Online Article Text |
id | pubmed-8114806 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-81148062021-05-18 Natural language inference for Malayalam language using language agnostic sentence representation Renjit, Sara Idicula, Sumam PeerJ Comput Sci Computational Linguistics Natural language inference (NLI) is an essential subtask in many natural language processing applications. It is a directional relationship from premise to hypothesis. A pair of texts is defined as entailed if a text infers its meaning from the other text. The NLI is also known as textual entailment recognition, and it recognizes entailed and contradictory sentences in various NLP systems like Question Answering, Summarization and Information retrieval systems. This paper describes the NLI problem attempted for a low resource Indian language Malayalam, the regional language of Kerala. More than 30 million people speak this language. The paper is about the Malayalam NLI dataset, named MaNLI dataset, and its application of NLI in Malayalam language using different models, namely Doc2Vec (paragraph vector), fastText, BERT (Bidirectional Encoder Representation from Transformers), and LASER (Language Agnostic Sentence Representation). Our work attempts NLI in two ways, as binary classification and as multiclass classification. For both the classifications, LASER outperformed the other techniques. For multiclass classification, NLI using LASER based sentence embedding technique outperformed the other techniques by a significant margin of 12% accuracy. There was also an accuracy improvement of 9% for LASER based NLI system for binary classification over the other techniques. PeerJ Inc. 2021-05-04 /pmc/articles/PMC8114806/ /pubmed/34013034 http://dx.doi.org/10.7717/peerj-cs.508 Text en © 2021 Renjit and Idicula https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Computational Linguistics Renjit, Sara Idicula, Sumam Natural language inference for Malayalam language using language agnostic sentence representation |
title | Natural language inference for Malayalam language using language agnostic sentence representation |
title_full | Natural language inference for Malayalam language using language agnostic sentence representation |
title_fullStr | Natural language inference for Malayalam language using language agnostic sentence representation |
title_full_unstemmed | Natural language inference for Malayalam language using language agnostic sentence representation |
title_short | Natural language inference for Malayalam language using language agnostic sentence representation |
title_sort | natural language inference for malayalam language using language agnostic sentence representation |
topic | Computational Linguistics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8114806/ https://www.ncbi.nlm.nih.gov/pubmed/34013034 http://dx.doi.org/10.7717/peerj-cs.508 |
work_keys_str_mv | AT renjitsara naturallanguageinferenceformalayalamlanguageusinglanguageagnosticsentencerepresentation AT idiculasumam naturallanguageinferenceformalayalamlanguageusinglanguageagnosticsentencerepresentation |