Cargando…
KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences
BACKGROUND: Biomedical knowledge bases (KB’s) have become important assets in life sciences. Prior work on KB construction has three major limitations. First, most biomedical KBs are manually built and curated, and cannot keep up with the rate at which new findings are published. Second, for automat...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448285/ https://www.ncbi.nlm.nih.gov/pubmed/25971816 http://dx.doi.org/10.1186/s12859-015-0549-5 |
_version_ | 1782373687733780480 |
---|---|
author | Ernst, Patrick Siu, Amy Weikum, Gerhard |
author_facet | Ernst, Patrick Siu, Amy Weikum, Gerhard |
author_sort | Ernst, Patrick |
collection | PubMed |
description | BACKGROUND: Biomedical knowledge bases (KB’s) have become important assets in life sciences. Prior work on KB construction has three major limitations. First, most biomedical KBs are manually built and curated, and cannot keep up with the rate at which new findings are published. Second, for automatic information extraction (IE), the text genre of choice has been scientific publications, neglecting sources like health portals and online communities. Third, most prior work on IE has focused on the molecular level or chemogenomics only, like protein-protein interactions or gene-drug relationships, or solely address highly specific topics such as drug effects. RESULTS: We address these three limitations by a versatile and scalable approach to automatic KB construction. Using a small number of seed facts for distant supervision of pattern-based extraction, we harvest a huge number of facts in an automated manner without requiring any explicit training. We extend previous techniques for pattern-based IE with confidence statistics, and we combine this recall-oriented stage with logical reasoning for consistency constraint checking to achieve high precision. To our knowledge, this is the first method that uses consistency checking for biomedical relations. Our approach can be easily extended to incorporate additional relations and constraints. We ran extensive experiments not only for scientific publications, but also for encyclopedic health portals and online communities, creating different KB’s based on different configurations. We assess the size and quality of each KB, in terms of number of facts and precision. The best configured KB, KnowLife, contains more than 500,000 facts at a precision of 93% for 13 relations covering genes, organs, diseases, symptoms, treatments, as well as environmental and lifestyle risk factors. CONCLUSION: KnowLife is a large knowledge base for health and life sciences, automatically constructed from different Web sources. As a unique feature, KnowLife is harvested from different text genres such as scientific publications, health portals, and online communities. Thus, it has the potential to serve as one-stop portal for a wide range of relations and use cases. To showcase the breadth and usefulness, we make the KnowLife KB accessible through the health portal (http://knowlife.mpi-inf.mpg.de). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0549-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4448285 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-44482852015-05-30 KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences Ernst, Patrick Siu, Amy Weikum, Gerhard BMC Bioinformatics Research Article BACKGROUND: Biomedical knowledge bases (KB’s) have become important assets in life sciences. Prior work on KB construction has three major limitations. First, most biomedical KBs are manually built and curated, and cannot keep up with the rate at which new findings are published. Second, for automatic information extraction (IE), the text genre of choice has been scientific publications, neglecting sources like health portals and online communities. Third, most prior work on IE has focused on the molecular level or chemogenomics only, like protein-protein interactions or gene-drug relationships, or solely address highly specific topics such as drug effects. RESULTS: We address these three limitations by a versatile and scalable approach to automatic KB construction. Using a small number of seed facts for distant supervision of pattern-based extraction, we harvest a huge number of facts in an automated manner without requiring any explicit training. We extend previous techniques for pattern-based IE with confidence statistics, and we combine this recall-oriented stage with logical reasoning for consistency constraint checking to achieve high precision. To our knowledge, this is the first method that uses consistency checking for biomedical relations. Our approach can be easily extended to incorporate additional relations and constraints. We ran extensive experiments not only for scientific publications, but also for encyclopedic health portals and online communities, creating different KB’s based on different configurations. We assess the size and quality of each KB, in terms of number of facts and precision. The best configured KB, KnowLife, contains more than 500,000 facts at a precision of 93% for 13 relations covering genes, organs, diseases, symptoms, treatments, as well as environmental and lifestyle risk factors. CONCLUSION: KnowLife is a large knowledge base for health and life sciences, automatically constructed from different Web sources. As a unique feature, KnowLife is harvested from different text genres such as scientific publications, health portals, and online communities. Thus, it has the potential to serve as one-stop portal for a wide range of relations and use cases. To showcase the breadth and usefulness, we make the KnowLife KB accessible through the health portal (http://knowlife.mpi-inf.mpg.de). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0549-5) contains supplementary material, which is available to authorized users. BioMed Central 2015-05-14 /pmc/articles/PMC4448285/ /pubmed/25971816 http://dx.doi.org/10.1186/s12859-015-0549-5 Text en © Ernst et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Ernst, Patrick Siu, Amy Weikum, Gerhard KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences |
title | KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences |
title_full | KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences |
title_fullStr | KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences |
title_full_unstemmed | KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences |
title_short | KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences |
title_sort | knowlife: a versatile approach for constructing a large knowledge graph for biomedical sciences |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448285/ https://www.ncbi.nlm.nih.gov/pubmed/25971816 http://dx.doi.org/10.1186/s12859-015-0549-5 |
work_keys_str_mv | AT ernstpatrick knowlifeaversatileapproachforconstructingalargeknowledgegraphforbiomedicalsciences AT siuamy knowlifeaversatileapproachforconstructingalargeknowledgegraphforbiomedicalsciences AT weikumgerhard knowlifeaversatileapproachforconstructingalargeknowledgegraphforbiomedicalsciences |