Cargando…

Imitating Manual Curation of Text-Mined Facts in Biomedicine

Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted) of individual facts—to resolve data conflicts and inc...

Descripción completa

Detalles Bibliográficos
Autores principales: Rodriguez-Esteban, Raul, Iossifov, Ivan, Rzhetsky, Andrey
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1560402/
https://www.ncbi.nlm.nih.gov/pubmed/16965176
http://dx.doi.org/10.1371/journal.pcbi.0020118
_version_ 1782129488488824832
author Rodriguez-Esteban, Raul
Iossifov, Ivan
Rzhetsky, Andrey
author_facet Rodriguez-Esteban, Raul
Iossifov, Ivan
Rzhetsky, Andrey
author_sort Rodriguez-Esteban, Raul
collection PubMed
description Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted) of individual facts—to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations), we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95). Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine.
format Text
id pubmed-1560402
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-15604022006-10-02 Imitating Manual Curation of Text-Mined Facts in Biomedicine Rodriguez-Esteban, Raul Iossifov, Ivan Rzhetsky, Andrey PLoS Comput Biol Research Article Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted) of individual facts—to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations), we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95). Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine. Public Library of Science 2006-09 2006-09-08 /pmc/articles/PMC1560402/ /pubmed/16965176 http://dx.doi.org/10.1371/journal.pcbi.0020118 Text en © 2006 Rodriguez-Esteban et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Rodriguez-Esteban, Raul
Iossifov, Ivan
Rzhetsky, Andrey
Imitating Manual Curation of Text-Mined Facts in Biomedicine
title Imitating Manual Curation of Text-Mined Facts in Biomedicine
title_full Imitating Manual Curation of Text-Mined Facts in Biomedicine
title_fullStr Imitating Manual Curation of Text-Mined Facts in Biomedicine
title_full_unstemmed Imitating Manual Curation of Text-Mined Facts in Biomedicine
title_short Imitating Manual Curation of Text-Mined Facts in Biomedicine
title_sort imitating manual curation of text-mined facts in biomedicine
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1560402/
https://www.ncbi.nlm.nih.gov/pubmed/16965176
http://dx.doi.org/10.1371/journal.pcbi.0020118
work_keys_str_mv AT rodriguezestebanraul imitatingmanualcurationoftextminedfactsinbiomedicine
AT iossifovivan imitatingmanualcurationoftextminedfactsinbiomedicine
AT rzhetskyandrey imitatingmanualcurationoftextminedfactsinbiomedicine