Cargando…

Mining biomarker information in biomedical literature

BACKGROUND: For selection and evaluation of potential biomarkers, inclusion of already published information is of utmost importance. In spite of significant advancements in text- and data-mining techniques, the vast knowledge space of biomarkers in biomedical text has remained unexplored. Existing...

Descripción completa

Detalles Bibliográficos
Autores principales: Younesi, Erfan, Toldo, Luca, Müller, Bernd, Friedrich, Christoph M, Novac, Natalia, Scheer, Alexander, Hofmann-Apitius, Martin, Fluck, Juliane
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3541249/
https://www.ncbi.nlm.nih.gov/pubmed/23249606
http://dx.doi.org/10.1186/1472-6947-12-148
_version_ 1782255329758674944
author Younesi, Erfan
Toldo, Luca
Müller, Bernd
Friedrich, Christoph M
Novac, Natalia
Scheer, Alexander
Hofmann-Apitius, Martin
Fluck, Juliane
author_facet Younesi, Erfan
Toldo, Luca
Müller, Bernd
Friedrich, Christoph M
Novac, Natalia
Scheer, Alexander
Hofmann-Apitius, Martin
Fluck, Juliane
author_sort Younesi, Erfan
collection PubMed
description BACKGROUND: For selection and evaluation of potential biomarkers, inclusion of already published information is of utmost importance. In spite of significant advancements in text- and data-mining techniques, the vast knowledge space of biomarkers in biomedical text has remained unexplored. Existing named entity recognition approaches are not sufficiently selective for the retrieval of biomarker information from the literature. The purpose of this study was to identify textual features that enhance the effectiveness of biomarker information retrieval for different indication areas and diverse end user perspectives. METHODS: A biomarker terminology was created and further organized into six concept classes. Performance of this terminology was optimized towards balanced selectivity and specificity. The information retrieval performance using the biomarker terminology was evaluated based on various combinations of the terminology's six classes. Further validation of these results was performed on two independent corpora representing two different neurodegenerative diseases. RESULTS: The current state of the biomarker terminology contains 119 entity classes supported by 1890 different synonyms. The result of information retrieval shows improved retrieval rate of informative abstracts, which is achieved by including clinical management terms and evidence of gene/protein alterations (e.g. gene/protein expression status or certain polymorphisms) in combination with disease and gene name recognition. When additional filtering through other classes (e.g. diagnostic or prognostic methods) is applied, the typical high number of unspecific search results is significantly reduced. The evaluation results suggest that this approach enables the automated identification of biomarker information in the literature. A demo version of the search engine SCAIView, including the biomarker retrieval, is made available to the public through http://www.scaiview.com/scaiview-academia.html. CONCLUSIONS: The approach presented in this paper demonstrates that using a dedicated biomarker terminology for automated analysis of the scientific literature maybe helpful as an aid to finding biomarker information in text. Successful extraction of candidate biomarkers information from published resources can be considered as the first step towards developing novel hypotheses. These hypotheses will be valuable for the early decision-making in the drug discovery and development process.
format Online
Article
Text
id pubmed-3541249
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35412492013-01-11 Mining biomarker information in biomedical literature Younesi, Erfan Toldo, Luca Müller, Bernd Friedrich, Christoph M Novac, Natalia Scheer, Alexander Hofmann-Apitius, Martin Fluck, Juliane BMC Med Inform Decis Mak Technical Advance BACKGROUND: For selection and evaluation of potential biomarkers, inclusion of already published information is of utmost importance. In spite of significant advancements in text- and data-mining techniques, the vast knowledge space of biomarkers in biomedical text has remained unexplored. Existing named entity recognition approaches are not sufficiently selective for the retrieval of biomarker information from the literature. The purpose of this study was to identify textual features that enhance the effectiveness of biomarker information retrieval for different indication areas and diverse end user perspectives. METHODS: A biomarker terminology was created and further organized into six concept classes. Performance of this terminology was optimized towards balanced selectivity and specificity. The information retrieval performance using the biomarker terminology was evaluated based on various combinations of the terminology's six classes. Further validation of these results was performed on two independent corpora representing two different neurodegenerative diseases. RESULTS: The current state of the biomarker terminology contains 119 entity classes supported by 1890 different synonyms. The result of information retrieval shows improved retrieval rate of informative abstracts, which is achieved by including clinical management terms and evidence of gene/protein alterations (e.g. gene/protein expression status or certain polymorphisms) in combination with disease and gene name recognition. When additional filtering through other classes (e.g. diagnostic or prognostic methods) is applied, the typical high number of unspecific search results is significantly reduced. The evaluation results suggest that this approach enables the automated identification of biomarker information in the literature. A demo version of the search engine SCAIView, including the biomarker retrieval, is made available to the public through http://www.scaiview.com/scaiview-academia.html. CONCLUSIONS: The approach presented in this paper demonstrates that using a dedicated biomarker terminology for automated analysis of the scientific literature maybe helpful as an aid to finding biomarker information in text. Successful extraction of candidate biomarkers information from published resources can be considered as the first step towards developing novel hypotheses. These hypotheses will be valuable for the early decision-making in the drug discovery and development process. BioMed Central 2012-12-18 /pmc/articles/PMC3541249/ /pubmed/23249606 http://dx.doi.org/10.1186/1472-6947-12-148 Text en Copyright ©2012 Younesi et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Advance
Younesi, Erfan
Toldo, Luca
Müller, Bernd
Friedrich, Christoph M
Novac, Natalia
Scheer, Alexander
Hofmann-Apitius, Martin
Fluck, Juliane
Mining biomarker information in biomedical literature
title Mining biomarker information in biomedical literature
title_full Mining biomarker information in biomedical literature
title_fullStr Mining biomarker information in biomedical literature
title_full_unstemmed Mining biomarker information in biomedical literature
title_short Mining biomarker information in biomedical literature
title_sort mining biomarker information in biomedical literature
topic Technical Advance
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3541249/
https://www.ncbi.nlm.nih.gov/pubmed/23249606
http://dx.doi.org/10.1186/1472-6947-12-148
work_keys_str_mv AT younesierfan miningbiomarkerinformationinbiomedicalliterature
AT toldoluca miningbiomarkerinformationinbiomedicalliterature
AT mullerbernd miningbiomarkerinformationinbiomedicalliterature
AT friedrichchristophm miningbiomarkerinformationinbiomedicalliterature
AT novacnatalia miningbiomarkerinformationinbiomedicalliterature
AT scheeralexander miningbiomarkerinformationinbiomedicalliterature
AT hofmannapitiusmartin miningbiomarkerinformationinbiomedicalliterature
AT fluckjuliane miningbiomarkerinformationinbiomedicalliterature