Cargando…

Automated PDF highlighting to support faster curation of literature for Parkinson’s and Alzheimer’s disease

Neurodegenerative disorders such as Parkinson’s and Alzheimer’s disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY proje...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Honghan, Oellrich, Anika, Girges, Christine, de Bono, Bernard, Hubbard, Tim J.P., Dobson, Richard J.B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467557/
https://www.ncbi.nlm.nih.gov/pubmed/28365743
http://dx.doi.org/10.1093/database/bax027
_version_ 1783243288233377792
author Wu, Honghan
Oellrich, Anika
Girges, Christine
de Bono, Bernard
Hubbard, Tim J.P.
Dobson, Richard J.B.
author_facet Wu, Honghan
Oellrich, Anika
Girges, Christine
de Bono, Bernard
Hubbard, Tim J.P.
Dobson, Richard J.B.
author_sort Wu, Honghan
collection PubMed
description Neurodegenerative disorders such as Parkinson’s and Alzheimer’s disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY project aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed on these illnesses. As curation is labour-intensive, we aimed to speed up the process by automatically highlighting those parts of the PDF document of primary importance to the curator. Using techniques similar to those of summarisation, we developed an algorithm that relies on linguistic, semantic and spatial features. Employing this algorithm on a test set manually corrected for tool imprecision, we achieved a macro F(1)-measure of 0.51, which is an increase of 132% compared to the best bag-of-words baseline model. A user based evaluation was also conducted to assess the usefulness of the methodology on 40 unseen publications, which reveals that in 85% of cases all highlighted sentences are relevant to the curation task and in about 65% of the cases, the highlights are sufficient to support the knowledge curation task without needing to consult the full text. In conclusion, we believe that these are promising results for a step in automating the recognition of curation-relevant sentences. Refining our approach to pre-digest papers will lead to faster processing and cost reduction in the curation process. Database URL: https://github.com/KHP-Informatics/NapEasy
format Online
Article
Text
id pubmed-5467557
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54675572017-06-19 Automated PDF highlighting to support faster curation of literature for Parkinson’s and Alzheimer’s disease Wu, Honghan Oellrich, Anika Girges, Christine de Bono, Bernard Hubbard, Tim J.P. Dobson, Richard J.B. Database (Oxford) Original Article Neurodegenerative disorders such as Parkinson’s and Alzheimer’s disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY project aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed on these illnesses. As curation is labour-intensive, we aimed to speed up the process by automatically highlighting those parts of the PDF document of primary importance to the curator. Using techniques similar to those of summarisation, we developed an algorithm that relies on linguistic, semantic and spatial features. Employing this algorithm on a test set manually corrected for tool imprecision, we achieved a macro F(1)-measure of 0.51, which is an increase of 132% compared to the best bag-of-words baseline model. A user based evaluation was also conducted to assess the usefulness of the methodology on 40 unseen publications, which reveals that in 85% of cases all highlighted sentences are relevant to the curation task and in about 65% of the cases, the highlights are sufficient to support the knowledge curation task without needing to consult the full text. In conclusion, we believe that these are promising results for a step in automating the recognition of curation-relevant sentences. Refining our approach to pre-digest papers will lead to faster processing and cost reduction in the curation process. Database URL: https://github.com/KHP-Informatics/NapEasy Oxford University Press 2017-03-27 /pmc/articles/PMC5467557/ /pubmed/28365743 http://dx.doi.org/10.1093/database/bax027 Text en © The Author(s) 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Wu, Honghan
Oellrich, Anika
Girges, Christine
de Bono, Bernard
Hubbard, Tim J.P.
Dobson, Richard J.B.
Automated PDF highlighting to support faster curation of literature for Parkinson’s and Alzheimer’s disease
title Automated PDF highlighting to support faster curation of literature for Parkinson’s and Alzheimer’s disease
title_full Automated PDF highlighting to support faster curation of literature for Parkinson’s and Alzheimer’s disease
title_fullStr Automated PDF highlighting to support faster curation of literature for Parkinson’s and Alzheimer’s disease
title_full_unstemmed Automated PDF highlighting to support faster curation of literature for Parkinson’s and Alzheimer’s disease
title_short Automated PDF highlighting to support faster curation of literature for Parkinson’s and Alzheimer’s disease
title_sort automated pdf highlighting to support faster curation of literature for parkinson’s and alzheimer’s disease
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467557/
https://www.ncbi.nlm.nih.gov/pubmed/28365743
http://dx.doi.org/10.1093/database/bax027
work_keys_str_mv AT wuhonghan automatedpdfhighlightingtosupportfastercurationofliteratureforparkinsonsandalzheimersdisease
AT oellrichanika automatedpdfhighlightingtosupportfastercurationofliteratureforparkinsonsandalzheimersdisease
AT girgeschristine automatedpdfhighlightingtosupportfastercurationofliteratureforparkinsonsandalzheimersdisease
AT debonobernard automatedpdfhighlightingtosupportfastercurationofliteratureforparkinsonsandalzheimersdisease
AT hubbardtimjp automatedpdfhighlightingtosupportfastercurationofliteratureforparkinsonsandalzheimersdisease
AT dobsonrichardjb automatedpdfhighlightingtosupportfastercurationofliteratureforparkinsonsandalzheimersdisease