Cargando…

CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning

BACKGROUND: Literature about SARS-CoV-2 widely discusses the effects of variations that have spread in the past 3 years. Such information is dispersed in the texts of several research articles, hindering the possibility of practically integrating it with related datasets (e.g., millions of SARS-CoV-...

Descripción completa

Detalles Bibliográficos
Autores principales: Serna García, Giuseppe, Al Khalaf, Ruba, Invernici, Francesco, Ceri, Stefano, Bernasconi, Anna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10205000/
https://www.ncbi.nlm.nih.gov/pubmed/37222749
http://dx.doi.org/10.1093/gigascience/giad036
_version_ 1785045948624273408
author Serna García, Giuseppe
Al Khalaf, Ruba
Invernici, Francesco
Ceri, Stefano
Bernasconi, Anna
author_facet Serna García, Giuseppe
Al Khalaf, Ruba
Invernici, Francesco
Ceri, Stefano
Bernasconi, Anna
author_sort Serna García, Giuseppe
collection PubMed
description BACKGROUND: Literature about SARS-CoV-2 widely discusses the effects of variations that have spread in the past 3 years. Such information is dispersed in the texts of several research articles, hindering the possibility of practically integrating it with related datasets (e.g., millions of SARS-CoV-2 sequences available to the community). We aim to fill this gap, by mining literature abstracts to extract—for each variant/mutation—its related effects (in epidemiological, immunological, clinical, or viral kinetics terms) with labeled higher/lower levels in relation to the nonmutated virus. RESULTS: The proposed framework comprises (i) the provisioning of abstracts from a COVID-19–related big data corpus (CORD-19) and (ii) the identification of mutation/variant effects in abstracts using a GPT2-based prediction model. The above techniques enable the prediction of mutations/variants with their effects and levels in 2 distinct scenarios: (i) the batch annotation of the most relevant CORD-19 abstracts and (ii) the on-demand annotation of any user-selected CORD-19 abstract through the CoVEffect web application (http://gmql.eu/coveffect), which assists expert users with semiautomated data labeling. On the interface, users can inspect the predictions and correct them; user inputs can then extend the training dataset used by the prediction model. Our prototype model was trained through a carefully designed process, using a minimal and highly diversified pool of samples. CONCLUSIONS: The CoVEffect interface serves for the assisted annotation of abstracts, allowing the download of curated datasets for further use in data integration or analysis pipelines. The overall framework can be adapted to resolve similar unstructured-to-structured text translation tasks, which are typical of biomedical domains.
format Online
Article
Text
id pubmed-10205000
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-102050002023-05-24 CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning Serna García, Giuseppe Al Khalaf, Ruba Invernici, Francesco Ceri, Stefano Bernasconi, Anna Gigascience Research BACKGROUND: Literature about SARS-CoV-2 widely discusses the effects of variations that have spread in the past 3 years. Such information is dispersed in the texts of several research articles, hindering the possibility of practically integrating it with related datasets (e.g., millions of SARS-CoV-2 sequences available to the community). We aim to fill this gap, by mining literature abstracts to extract—for each variant/mutation—its related effects (in epidemiological, immunological, clinical, or viral kinetics terms) with labeled higher/lower levels in relation to the nonmutated virus. RESULTS: The proposed framework comprises (i) the provisioning of abstracts from a COVID-19–related big data corpus (CORD-19) and (ii) the identification of mutation/variant effects in abstracts using a GPT2-based prediction model. The above techniques enable the prediction of mutations/variants with their effects and levels in 2 distinct scenarios: (i) the batch annotation of the most relevant CORD-19 abstracts and (ii) the on-demand annotation of any user-selected CORD-19 abstract through the CoVEffect web application (http://gmql.eu/coveffect), which assists expert users with semiautomated data labeling. On the interface, users can inspect the predictions and correct them; user inputs can then extend the training dataset used by the prediction model. Our prototype model was trained through a carefully designed process, using a minimal and highly diversified pool of samples. CONCLUSIONS: The CoVEffect interface serves for the assisted annotation of abstracts, allowing the download of curated datasets for further use in data integration or analysis pipelines. The overall framework can be adapted to resolve similar unstructured-to-structured text translation tasks, which are typical of biomedical domains. Oxford University Press 2023-05-23 /pmc/articles/PMC10205000/ /pubmed/37222749 http://dx.doi.org/10.1093/gigascience/giad036 Text en © The Author(s) 2023. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Serna García, Giuseppe
Al Khalaf, Ruba
Invernici, Francesco
Ceri, Stefano
Bernasconi, Anna
CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning
title CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning
title_full CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning
title_fullStr CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning
title_full_unstemmed CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning
title_short CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning
title_sort coveffect: interactive system for mining the effects of sars-cov-2 mutations and variants based on deep learning
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10205000/
https://www.ncbi.nlm.nih.gov/pubmed/37222749
http://dx.doi.org/10.1093/gigascience/giad036
work_keys_str_mv AT sernagarciagiuseppe coveffectinteractivesystemforminingtheeffectsofsarscov2mutationsandvariantsbasedondeeplearning
AT alkhalafruba coveffectinteractivesystemforminingtheeffectsofsarscov2mutationsandvariantsbasedondeeplearning
AT invernicifrancesco coveffectinteractivesystemforminingtheeffectsofsarscov2mutationsandvariantsbasedondeeplearning
AT ceristefano coveffectinteractivesystemforminingtheeffectsofsarscov2mutationsandvariantsbasedondeeplearning
AT bernasconianna coveffectinteractivesystemforminingtheeffectsofsarscov2mutationsandvariantsbasedondeeplearning