Cargando…

COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization

The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. Throughout 2020, over 400,000 coronavirus-related publications have been col...

Descripción completa

Detalles Bibliográficos
Autores principales:	Esteva, Andre, Kale, Anuprit, Paulus, Romain, Hashimoto, Kazuma, Yin, Wenpeng, Radev, Dragomir, Socher, Richard
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8041998/ https://www.ncbi.nlm.nih.gov/pubmed/33846532 http://dx.doi.org/10.1038/s41746-021-00437-0

_version_	1783678040339906560
author	Esteva, Andre Kale, Anuprit Paulus, Romain Hashimoto, Kazuma Yin, Wenpeng Radev, Dragomir Socher, Richard
author_facet	Esteva, Andre Kale, Anuprit Paulus, Romain Hashimoto, Kazuma Yin, Wenpeng Radev, Dragomir Socher, Richard
author_sort	Esteva, Andre
collection	PubMed
description	The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. Throughout 2020, over 400,000 coronavirus-related publications have been collected through the COVID-19 Open Research Dataset. Here, we present CO-Search, a semantic, multi-stage, search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers and avoiding misinformation during a time of crisis. CO-Search is built from two sequential parts: a hybrid semantic-keyword retriever, which takes an input query and returns a sorted list of the 1000 most relevant documents, and a re-ranker, which further orders them by relevance. The retriever is composed of a deep learning model (Siamese-BERT) that encodes query-level meaning, along with two keyword-based models (BM25, TF-IDF) that emphasize the most important words of a query. The re-ranker assigns a relevance score to each document, computed from the outputs of (1) a question–answering module which gauges how much each document answers the query, and (2) an abstractive summarization module which determines how well a query matches a generated summary of the document. To account for the relatively limited dataset, we develop a text augmentation technique which splits the documents into pairs of paragraphs and the citations contained in them, creating millions of (citation title, paragraph) tuples for training the retriever. We evaluate our system (http://einstein.ai/covid) on the data of the TREC-COVID information retrieval challenge, obtaining strong performance across multiple key information retrieval metrics.
format	Online Article Text
id	pubmed-8041998
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-80419982021-04-28 COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization Esteva, Andre Kale, Anuprit Paulus, Romain Hashimoto, Kazuma Yin, Wenpeng Radev, Dragomir Socher, Richard NPJ Digit Med Article The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. Throughout 2020, over 400,000 coronavirus-related publications have been collected through the COVID-19 Open Research Dataset. Here, we present CO-Search, a semantic, multi-stage, search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers and avoiding misinformation during a time of crisis. CO-Search is built from two sequential parts: a hybrid semantic-keyword retriever, which takes an input query and returns a sorted list of the 1000 most relevant documents, and a re-ranker, which further orders them by relevance. The retriever is composed of a deep learning model (Siamese-BERT) that encodes query-level meaning, along with two keyword-based models (BM25, TF-IDF) that emphasize the most important words of a query. The re-ranker assigns a relevance score to each document, computed from the outputs of (1) a question–answering module which gauges how much each document answers the query, and (2) an abstractive summarization module which determines how well a query matches a generated summary of the document. To account for the relatively limited dataset, we develop a text augmentation technique which splits the documents into pairs of paragraphs and the citations contained in them, creating millions of (citation title, paragraph) tuples for training the retriever. We evaluate our system (http://einstein.ai/covid) on the data of the TREC-COVID information retrieval challenge, obtaining strong performance across multiple key information retrieval metrics. Nature Publishing Group UK 2021-04-12 /pmc/articles/PMC8041998/ /pubmed/33846532 http://dx.doi.org/10.1038/s41746-021-00437-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Esteva, Andre Kale, Anuprit Paulus, Romain Hashimoto, Kazuma Yin, Wenpeng Radev, Dragomir Socher, Richard COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
title	COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
title_full	COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
title_fullStr	COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
title_full_unstemmed	COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
title_short	COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
title_sort	covid-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8041998/ https://www.ncbi.nlm.nih.gov/pubmed/33846532 http://dx.doi.org/10.1038/s41746-021-00437-0
work_keys_str_mv	AT estevaandre covid19informationretrievalwithdeeplearningbasedsemanticsearchquestionansweringandabstractivesummarization AT kaleanuprit covid19informationretrievalwithdeeplearningbasedsemanticsearchquestionansweringandabstractivesummarization AT paulusromain covid19informationretrievalwithdeeplearningbasedsemanticsearchquestionansweringandabstractivesummarization AT hashimotokazuma covid19informationretrievalwithdeeplearningbasedsemanticsearchquestionansweringandabstractivesummarization AT yinwenpeng covid19informationretrievalwithdeeplearningbasedsemanticsearchquestionansweringandabstractivesummarization AT radevdragomir covid19informationretrievalwithdeeplearningbasedsemanticsearchquestionansweringandabstractivesummarization AT socherrichard covid19informationretrievalwithdeeplearningbasedsemanticsearchquestionansweringandabstractivesummarization

COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization

Ejemplares similares