Cargando…

DEXTER: Disease-Expression Relation Extraction from Text

Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtai...

Descripción completa

Detalles Bibliográficos
Autores principales: Gupta, Samir, Dingerdissen, Hayley, Ross, Karen E, Hu, Yu, Wu, Cathy H, Mazumder, Raja, Vijay-Shanker, K
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007211/
https://www.ncbi.nlm.nih.gov/pubmed/29860481
http://dx.doi.org/10.1093/database/bay045
_version_ 1783332993370161152
author Gupta, Samir
Dingerdissen, Hayley
Ross, Karen E
Hu, Yu
Wu, Cathy H
Mazumder, Raja
Vijay-Shanker, K
author_facet Gupta, Samir
Dingerdissen, Hayley
Ross, Karen E
Hu, Yu
Wu, Cathy H
Mazumder, Raja
Vijay-Shanker, K
author_sort Gupta, Samir
collection PubMed
description Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease. The scientific literature is another rich source of information on gene expression–disease relationships that not only have been captured from large-scale studies but have also been observed in thousands of small-scale studies. Expression information obtained from literature through manual curation can extend expression databases. While many of the existing databases include information from literature, they are limited by the time-consuming nature of manual curation and have difficulty keeping up with the explosion of publications in the biomedical field. In this work, we describe an automated text-mining tool, Disease-Expression Relation Extraction from Text (DEXTER) to extract information from literature on gene and microRNA expression in the context of disease. One of the motivations in developing DEXTER was to extend the BioXpress database, a cancer-focused gene expression database that includes data derived from large-scale experiments and manual curation of publications. The literature-based portion of BioXpress lags behind significantly compared to expression information obtained from large-scale studies and can benefit from our text-mined results. We have conducted two different evaluations to measure the accuracy of our text-mining tool and achieved average F-scores of 88.51 and 81.81% for the two evaluations, respectively. Also, to demonstrate the ability to extract rich expression information in different disease-related scenarios, we used DEXTER to extract information on differential expression information for 2024 genes in lung cancer, 115 glycosyltransferases in 62 cancers and 826 microRNA in 171 cancers. All extractions using DEXTER are integrated in the literature-based portion of BioXpress. Database URL: http://biotm.cis.udel.edu/DEXTER
format Online
Article
Text
id pubmed-6007211
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60072112018-06-25 DEXTER: Disease-Expression Relation Extraction from Text Gupta, Samir Dingerdissen, Hayley Ross, Karen E Hu, Yu Wu, Cathy H Mazumder, Raja Vijay-Shanker, K Database (Oxford) Original Article Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease. The scientific literature is another rich source of information on gene expression–disease relationships that not only have been captured from large-scale studies but have also been observed in thousands of small-scale studies. Expression information obtained from literature through manual curation can extend expression databases. While many of the existing databases include information from literature, they are limited by the time-consuming nature of manual curation and have difficulty keeping up with the explosion of publications in the biomedical field. In this work, we describe an automated text-mining tool, Disease-Expression Relation Extraction from Text (DEXTER) to extract information from literature on gene and microRNA expression in the context of disease. One of the motivations in developing DEXTER was to extend the BioXpress database, a cancer-focused gene expression database that includes data derived from large-scale experiments and manual curation of publications. The literature-based portion of BioXpress lags behind significantly compared to expression information obtained from large-scale studies and can benefit from our text-mined results. We have conducted two different evaluations to measure the accuracy of our text-mining tool and achieved average F-scores of 88.51 and 81.81% for the two evaluations, respectively. Also, to demonstrate the ability to extract rich expression information in different disease-related scenarios, we used DEXTER to extract information on differential expression information for 2024 genes in lung cancer, 115 glycosyltransferases in 62 cancers and 826 microRNA in 171 cancers. All extractions using DEXTER are integrated in the literature-based portion of BioXpress. Database URL: http://biotm.cis.udel.edu/DEXTER Oxford University Press 2018-05-30 /pmc/articles/PMC6007211/ /pubmed/29860481 http://dx.doi.org/10.1093/database/bay045 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Gupta, Samir
Dingerdissen, Hayley
Ross, Karen E
Hu, Yu
Wu, Cathy H
Mazumder, Raja
Vijay-Shanker, K
DEXTER: Disease-Expression Relation Extraction from Text
title DEXTER: Disease-Expression Relation Extraction from Text
title_full DEXTER: Disease-Expression Relation Extraction from Text
title_fullStr DEXTER: Disease-Expression Relation Extraction from Text
title_full_unstemmed DEXTER: Disease-Expression Relation Extraction from Text
title_short DEXTER: Disease-Expression Relation Extraction from Text
title_sort dexter: disease-expression relation extraction from text
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007211/
https://www.ncbi.nlm.nih.gov/pubmed/29860481
http://dx.doi.org/10.1093/database/bay045
work_keys_str_mv AT guptasamir dexterdiseaseexpressionrelationextractionfromtext
AT dingerdissenhayley dexterdiseaseexpressionrelationextractionfromtext
AT rosskarene dexterdiseaseexpressionrelationextractionfromtext
AT huyu dexterdiseaseexpressionrelationextractionfromtext
AT wucathyh dexterdiseaseexpressionrelationextractionfromtext
AT mazumderraja dexterdiseaseexpressionrelationextractionfromtext
AT vijayshankerk dexterdiseaseexpressionrelationextractionfromtext