Cargando…

Artificial Intelligence-Driven Structurization of Diagnostic Information in Free-Text Pathology Reports

BACKGROUND: Free-text sections of pathology reports contain the most important information from a diagnostic standpoint. However, this information is largely underutilized for computer-based analytics. The vast majority of NLP-based methods lack a capacity to accurately extract complex diagnostic en...

Descripción completa

Detalles Bibliográficos
Autores principales: Giannaris, Pericles S., Al-Taie, Zainab, Kovalenko, Mikhail, Thanintorn, Nattapon, Kholod, Olha, Innokenteva, Yulia, Coberly, Emily, Frazier, Shellaine, Laziuk, Katsiarina, Popescu, Mihail, Shyu, Chi-Ren, Xu, Dong, Hammer, Richard D., Shin, Dmitriy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Wolters Kluwer - Medknow 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7045509/
https://www.ncbi.nlm.nih.gov/pubmed/32166042
http://dx.doi.org/10.4103/jpi.jpi_30_19
Descripción
Sumario:BACKGROUND: Free-text sections of pathology reports contain the most important information from a diagnostic standpoint. However, this information is largely underutilized for computer-based analytics. The vast majority of NLP-based methods lack a capacity to accurately extract complex diagnostic entities and relationships among them as well as to provide an adequate knowledge representation for downstream data-mining applications. METHODS: In this paper, we introduce a novel informatics pipeline that extends open information extraction (openIE) techniques with artificial intelligence (AI) based modeling to extract and transform complex diagnostic entities and relationships among them into Knowledge Graphs (KGs) of relational triples (RTs). RESULTS: Evaluation studies have demonstrated that the pipeline's output significantly differs from a random process. The semantic similarity with original reports is high (Mean Weighted Overlap of 0.83). The precision and recall of extracted RTs based on experts’ assessment were 0.925 and 0.841 respectively (P <0.0001). Inter-rater agreement was significant at 93.6% and inter-rated reliability was 81.8%. CONCLUSION: The results demonstrated important properties of the pipeline such as high accuracy, minimality and adequate knowledge representation. Therefore, we conclude that the pipeline can be used in various downstream data-mining applications to assist diagnostic medicine.