Cargando…

Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids

BACKGROUND: Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jordan, Rick, Visweswaran, Shyam, Gopalakrishnan, Vanathi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4215335/ https://www.ncbi.nlm.nih.gov/pubmed/25379168 http://dx.doi.org/10.1186/2043-9113-4-13

_version_	1782342073264898048
author	Jordan, Rick Visweswaran, Shyam Gopalakrishnan, Vanathi
author_facet	Jordan, Rick Visweswaran, Shyam Gopalakrishnan, Vanathi
author_sort	Jordan, Rick
collection	PubMed
description	BACKGROUND: Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts obtained from PubMed to discover putative biomarkers of breast and lung cancers in specific biofluids. METHODOLOGY: A positive set of abstracts was defined by the terms ‘breast cancer’ and ‘lung cancer’ in conjunction with 14 separate ‘biofluids’ (bile, blood, breastmilk, cerebrospinal fluid, mucus, plasma, saliva, semen, serum, synovial fluid, stool, sweat, tears, and urine), while a negative set of abstracts was defined by the terms ‘(biofluid) NOT breast cancer’ or ‘(biofluid) NOT lung cancer.’ More than 5.3 million total abstracts were obtained from PubMed and examined for biomarker-disease-biofluid associations (34,296 positive and 2,653,396 negative for breast cancer; 28,355 positive and 2,595,034 negative for lung cancer). Biological entities such as genes and proteins were tagged using ABNER, and processed using Python scripts to produce a list of putative biomarkers. Z-scores were calculated, ranked, and used to determine significance of putative biomarkers found. Manual verification of relevant abstracts was performed to assess our method’s performance. RESULTS: Biofluid-specific markers were identified from the literature, assigned relevance scores based on frequency of occurrence, and validated using known biomarker lists and/or databases for lung and breast cancer [NCBI’s On-line Mendelian Inheritance in Man (OMIM), Cancer Gene annotation server for cancer genomics (CAGE), NCBI’s Genes & Disease, NCI’s Early Detection Research Network (EDRN), and others]. The specificity of each marker for a given biofluid was calculated, and the performance of our semi-automated literature mining method assessed for breast and lung cancer. CONCLUSIONS: We developed a semi-automated process for determining a list of putative biomarkers for breast and lung cancer. New knowledge is presented in the form of biomarker lists; ranked, newly discovered biomarker-disease-biofluid relationships; and biomarker specificity across biofluids.
format	Online Article Text
id	pubmed-4215335
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42153352014-11-06 Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids Jordan, Rick Visweswaran, Shyam Gopalakrishnan, Vanathi J Clin Bioinforma Research BACKGROUND: Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts obtained from PubMed to discover putative biomarkers of breast and lung cancers in specific biofluids. METHODOLOGY: A positive set of abstracts was defined by the terms ‘breast cancer’ and ‘lung cancer’ in conjunction with 14 separate ‘biofluids’ (bile, blood, breastmilk, cerebrospinal fluid, mucus, plasma, saliva, semen, serum, synovial fluid, stool, sweat, tears, and urine), while a negative set of abstracts was defined by the terms ‘(biofluid) NOT breast cancer’ or ‘(biofluid) NOT lung cancer.’ More than 5.3 million total abstracts were obtained from PubMed and examined for biomarker-disease-biofluid associations (34,296 positive and 2,653,396 negative for breast cancer; 28,355 positive and 2,595,034 negative for lung cancer). Biological entities such as genes and proteins were tagged using ABNER, and processed using Python scripts to produce a list of putative biomarkers. Z-scores were calculated, ranked, and used to determine significance of putative biomarkers found. Manual verification of relevant abstracts was performed to assess our method’s performance. RESULTS: Biofluid-specific markers were identified from the literature, assigned relevance scores based on frequency of occurrence, and validated using known biomarker lists and/or databases for lung and breast cancer [NCBI’s On-line Mendelian Inheritance in Man (OMIM), Cancer Gene annotation server for cancer genomics (CAGE), NCBI’s Genes & Disease, NCI’s Early Detection Research Network (EDRN), and others]. The specificity of each marker for a given biofluid was calculated, and the performance of our semi-automated literature mining method assessed for breast and lung cancer. CONCLUSIONS: We developed a semi-automated process for determining a list of putative biomarkers for breast and lung cancer. New knowledge is presented in the form of biomarker lists; ranked, newly discovered biomarker-disease-biofluid relationships; and biomarker specificity across biofluids. BioMed Central 2014-10-23 /pmc/articles/PMC4215335/ /pubmed/25379168 http://dx.doi.org/10.1186/2043-9113-4-13 Text en Copyright © 2014 Jordan et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Jordan, Rick Visweswaran, Shyam Gopalakrishnan, Vanathi Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids
title	Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids
title_full	Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids
title_fullStr	Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids
title_full_unstemmed	Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids
title_short	Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids
title_sort	semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4215335/ https://www.ncbi.nlm.nih.gov/pubmed/25379168 http://dx.doi.org/10.1186/2043-9113-4-13
work_keys_str_mv	AT jordanrick semiautomatedliteratureminingtoidentifyputativebiomarkersofdiseasefrommultiplebiofluids AT visweswaranshyam semiautomatedliteratureminingtoidentifyputativebiomarkersofdiseasefrommultiplebiofluids AT gopalakrishnanvanathi semiautomatedliteratureminingtoidentifyputativebiomarkersofdiseasefrommultiplebiofluids

Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids

Ejemplares similares