Cargando…

Building a large gene expression-cancer knowledge base with limited human annotations

Cancer prevention is one of the most pressing challenges that public health needs to face. In this regard, data-driven research is central to assist medical solutions targeting cancer. To fully harness the power of data-driven research, it is imperative to have well-organized machine-readable facts...

Descripción completa

Detalles Bibliográficos
Autores principales: Marchesin, Stefano, Menotti, Laura, Giachelle, Fabio, Silvello, Gianmaria, Alonso, Omar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10533344/
https://www.ncbi.nlm.nih.gov/pubmed/37768281
http://dx.doi.org/10.1093/database/baad061
Descripción
Sumario:Cancer prevention is one of the most pressing challenges that public health needs to face. In this regard, data-driven research is central to assist medical solutions targeting cancer. To fully harness the power of data-driven research, it is imperative to have well-organized machine-readable facts into a knowledge base (KB). Motivated by this urgent need, we introduce the Collaborative Oriented Relation Extraction (CORE) system for building KBs with limited manual annotations. CORE is based on the combination of distant supervision and active learning paradigms and offers a seamless, transparent, modular architecture equipped for large-scale processing. We focus on precision medicine and build the largest KB on ‘fine-grained’ gene expression–cancer associations—a key to complement and validate experimental data for cancer research. We show the robustness of CORE and discuss the usefulness of the provided KB. Database URL https://zenodo.org/record/7577127