Cargando…

Indirect association and ranking hypotheses for literature based discovery

BACKGROUND: Literature Based Discovery (LBD) produces more potential hypotheses than can be manually reviewed, making automatically ranking these hypotheses critical. In this paper, we introduce the indirect association measures of Linking Term Association (LTA), Minimum Weight Association (MWA), an...

Descripción completa

Detalles Bibliográficos
Autores principales: Henry, Sam, McInnes, Bridget T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6694578/
https://www.ncbi.nlm.nih.gov/pubmed/31416434
http://dx.doi.org/10.1186/s12859-019-2989-9
_version_ 1783443853903134720
author Henry, Sam
McInnes, Bridget T.
author_facet Henry, Sam
McInnes, Bridget T.
author_sort Henry, Sam
collection PubMed
description BACKGROUND: Literature Based Discovery (LBD) produces more potential hypotheses than can be manually reviewed, making automatically ranking these hypotheses critical. In this paper, we introduce the indirect association measures of Linking Term Association (LTA), Minimum Weight Association (MWA), and Shared B to C Set Association (SBC), and compare them to Linking Set Association (LSA), concept embeddings vector cosine, Linking Term Count (LTC), and direct co-occurrence vector cosine. Our proposed indirect association measures extend traditional association measures to quantify indirect rather than direct associations while preserving valuable statistical properties. RESULTS: We perform a comparison between several different hypothesis ranking methods for LBD, and compare them against our proposed indirect association measures. We intrinsically evaluate each method’s performance using its ability to estimate semantic relatedness on standard evaluation datasets. We extrinsically evaluate each method’s ability to rank hypotheses in LBD using a time-slicing dataset based on co-occurrence information, and another time-slicing dataset based on SemRep extracted-relationships. Precision and recall curves are generated by ranking term pairs and applying a threshold at each rank. CONCLUSIONS: Results differ depending on the evaluation methods and datasets, but it is unclear if this is a result of biases in the evaluation datasets or if one method is truly better than another. We conclude that LTC and SBC are the best suited methods for hypothesis ranking in LBD, but there is value in having a variety of methods to choose from.
format Online
Article
Text
id pubmed-6694578
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66945782019-08-19 Indirect association and ranking hypotheses for literature based discovery Henry, Sam McInnes, Bridget T. BMC Bioinformatics Methodology Article BACKGROUND: Literature Based Discovery (LBD) produces more potential hypotheses than can be manually reviewed, making automatically ranking these hypotheses critical. In this paper, we introduce the indirect association measures of Linking Term Association (LTA), Minimum Weight Association (MWA), and Shared B to C Set Association (SBC), and compare them to Linking Set Association (LSA), concept embeddings vector cosine, Linking Term Count (LTC), and direct co-occurrence vector cosine. Our proposed indirect association measures extend traditional association measures to quantify indirect rather than direct associations while preserving valuable statistical properties. RESULTS: We perform a comparison between several different hypothesis ranking methods for LBD, and compare them against our proposed indirect association measures. We intrinsically evaluate each method’s performance using its ability to estimate semantic relatedness on standard evaluation datasets. We extrinsically evaluate each method’s ability to rank hypotheses in LBD using a time-slicing dataset based on co-occurrence information, and another time-slicing dataset based on SemRep extracted-relationships. Precision and recall curves are generated by ranking term pairs and applying a threshold at each rank. CONCLUSIONS: Results differ depending on the evaluation methods and datasets, but it is unclear if this is a result of biases in the evaluation datasets or if one method is truly better than another. We conclude that LTC and SBC are the best suited methods for hypothesis ranking in LBD, but there is value in having a variety of methods to choose from. BioMed Central 2019-08-15 /pmc/articles/PMC6694578/ /pubmed/31416434 http://dx.doi.org/10.1186/s12859-019-2989-9 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Henry, Sam
McInnes, Bridget T.
Indirect association and ranking hypotheses for literature based discovery
title Indirect association and ranking hypotheses for literature based discovery
title_full Indirect association and ranking hypotheses for literature based discovery
title_fullStr Indirect association and ranking hypotheses for literature based discovery
title_full_unstemmed Indirect association and ranking hypotheses for literature based discovery
title_short Indirect association and ranking hypotheses for literature based discovery
title_sort indirect association and ranking hypotheses for literature based discovery
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6694578/
https://www.ncbi.nlm.nih.gov/pubmed/31416434
http://dx.doi.org/10.1186/s12859-019-2989-9
work_keys_str_mv AT henrysam indirectassociationandrankinghypothesesforliteraturebaseddiscovery
AT mcinnesbridgett indirectassociationandrankinghypothesesforliteraturebaseddiscovery