Cargando…

MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature

In spite of the efforts in developing and maintaining accurate variant databases, a large number of disease-associated variants are still hidden in the biomedical literature. Curation of the biomedical literature in an effort to extract this information is a challenging task due to: (i) the complexi...

Descripción completa

Detalles Bibliográficos
Autores principales: Saberian, Nafiseh, Shafi, Adib, Peyvandipour, Azam, Draghici, Sorin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7378213/
https://www.ncbi.nlm.nih.gov/pubmed/32703994
http://dx.doi.org/10.1038/s41598-020-68649-0
_version_ 1783562366968922112
author Saberian, Nafiseh
Shafi, Adib
Peyvandipour, Azam
Draghici, Sorin
author_facet Saberian, Nafiseh
Shafi, Adib
Peyvandipour, Azam
Draghici, Sorin
author_sort Saberian, Nafiseh
collection PubMed
description In spite of the efforts in developing and maintaining accurate variant databases, a large number of disease-associated variants are still hidden in the biomedical literature. Curation of the biomedical literature in an effort to extract this information is a challenging task due to: (i) the complexity of natural language processing, (ii) inconsistent use of standard recommendations for variant description, and (iii) the lack of clarity and consistency in describing the variant-genotype-phenotype associations in the biomedical literature. In this article, we employ text mining and word cloud analysis techniques to address these challenges. The proposed framework extracts the variant-gene-disease associations from the full-length biomedical literature and designs an evidence-based variant-driven gene panel for a given condition. We validate the identified genes by showing their diagnostic abilities to predict the patients’ clinical outcome on several independent validation cohorts. As representative examples, we present our results for acute myeloid leukemia (AML), breast cancer and prostate cancer. We compare these panels with other variant-driven gene panels obtained from Clinvar, Mastermind and others from literature, as well as with a panel identified with a classical differentially expressed genes (DEGs) approach. The results show that the panels obtained by the proposed framework yield better results than the other gene panels currently available in the literature.
format Online
Article
Text
id pubmed-7378213
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-73782132020-07-24 MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature Saberian, Nafiseh Shafi, Adib Peyvandipour, Azam Draghici, Sorin Sci Rep Article In spite of the efforts in developing and maintaining accurate variant databases, a large number of disease-associated variants are still hidden in the biomedical literature. Curation of the biomedical literature in an effort to extract this information is a challenging task due to: (i) the complexity of natural language processing, (ii) inconsistent use of standard recommendations for variant description, and (iii) the lack of clarity and consistency in describing the variant-genotype-phenotype associations in the biomedical literature. In this article, we employ text mining and word cloud analysis techniques to address these challenges. The proposed framework extracts the variant-gene-disease associations from the full-length biomedical literature and designs an evidence-based variant-driven gene panel for a given condition. We validate the identified genes by showing their diagnostic abilities to predict the patients’ clinical outcome on several independent validation cohorts. As representative examples, we present our results for acute myeloid leukemia (AML), breast cancer and prostate cancer. We compare these panels with other variant-driven gene panels obtained from Clinvar, Mastermind and others from literature, as well as with a panel identified with a classical differentially expressed genes (DEGs) approach. The results show that the panels obtained by the proposed framework yield better results than the other gene panels currently available in the literature. Nature Publishing Group UK 2020-07-23 /pmc/articles/PMC7378213/ /pubmed/32703994 http://dx.doi.org/10.1038/s41598-020-68649-0 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Saberian, Nafiseh
Shafi, Adib
Peyvandipour, Azam
Draghici, Sorin
MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature
title MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature
title_full MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature
title_fullStr MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature
title_full_unstemmed MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature
title_short MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature
title_sort magpel: an automated pipeline for inferring variant-driven gene panels from the full-length biomedical literature
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7378213/
https://www.ncbi.nlm.nih.gov/pubmed/32703994
http://dx.doi.org/10.1038/s41598-020-68649-0
work_keys_str_mv AT saberiannafiseh magpelanautomatedpipelineforinferringvariantdrivengenepanelsfromthefulllengthbiomedicalliterature
AT shafiadib magpelanautomatedpipelineforinferringvariantdrivengenepanelsfromthefulllengthbiomedicalliterature
AT peyvandipourazam magpelanautomatedpipelineforinferringvariantdrivengenepanelsfromthefulllengthbiomedicalliterature
AT draghicisorin magpelanautomatedpipelineforinferringvariantdrivengenepanelsfromthefulllengthbiomedicalliterature