Cargando…

A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing

In clinical NLP, one major barrier to adopting crowdsourcing for NLP annotation is the issue of confidentiality for protected health information (PHI) in clinical narratives. In this paper, we investigated the use of a frequency-based approach to extract sentences without PHI. Our approach is based...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Dingcheng, Mojarad, Majid Rastegar, Li, Yanpeng, Sohn, Sunghwan, Mehrabi, Saeed, Elayavilli, Ravikumar Komandur, Yu, Yue, Liu, Hongfang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5859924/
https://www.ncbi.nlm.nih.gov/pubmed/26262333
_version_ 1783307927403102208
author Li, Dingcheng
Mojarad, Majid Rastegar
Li, Yanpeng
Sohn, Sunghwan
Mehrabi, Saeed
Elayavilli, Ravikumar Komandur
Yu, Yue
Liu, Hongfang
author_facet Li, Dingcheng
Mojarad, Majid Rastegar
Li, Yanpeng
Sohn, Sunghwan
Mehrabi, Saeed
Elayavilli, Ravikumar Komandur
Yu, Yue
Liu, Hongfang
author_sort Li, Dingcheng
collection PubMed
description In clinical NLP, one major barrier to adopting crowdsourcing for NLP annotation is the issue of confidentiality for protected health information (PHI) in clinical narratives. In this paper, we investigated the use of a frequency-based approach to extract sentences without PHI. Our approach is based on the assumption that sentences appearing frequently tend to contain no PHI. Both manual and automatic evaluations on 500 sentences out of the 7.9 million sentences of frequencies higher than one show that no PHI can be found among them. The promising results provide potentials of releasing those sentences for obtaining sentence-level NLP annotations via crowdsourcing.
format Online
Article
Text
id pubmed-5859924
institution National Center for Biotechnology Information
language English
publishDate 2015
record_format MEDLINE/PubMed
spelling pubmed-58599242018-03-20 A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing Li, Dingcheng Mojarad, Majid Rastegar Li, Yanpeng Sohn, Sunghwan Mehrabi, Saeed Elayavilli, Ravikumar Komandur Yu, Yue Liu, Hongfang Stud Health Technol Inform Article In clinical NLP, one major barrier to adopting crowdsourcing for NLP annotation is the issue of confidentiality for protected health information (PHI) in clinical narratives. In this paper, we investigated the use of a frequency-based approach to extract sentences without PHI. Our approach is based on the assumption that sentences appearing frequently tend to contain no PHI. Both manual and automatic evaluations on 500 sentences out of the 7.9 million sentences of frequencies higher than one show that no PHI can be found among them. The promising results provide potentials of releasing those sentences for obtaining sentence-level NLP annotations via crowdsourcing. 2015 /pmc/articles/PMC5859924/ /pubmed/26262333 Text en This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Article
Li, Dingcheng
Mojarad, Majid Rastegar
Li, Yanpeng
Sohn, Sunghwan
Mehrabi, Saeed
Elayavilli, Ravikumar Komandur
Yu, Yue
Liu, Hongfang
A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing
title A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing
title_full A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing
title_fullStr A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing
title_full_unstemmed A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing
title_short A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing
title_sort frequency-based strategy of obtaining sentences from clinical data repository for crowdsourcing
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5859924/
https://www.ncbi.nlm.nih.gov/pubmed/26262333
work_keys_str_mv AT lidingcheng afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT mojaradmajidrastegar afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT liyanpeng afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT sohnsunghwan afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT mehrabisaeed afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT elayavilliravikumarkomandur afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT yuyue afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT liuhongfang afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT lidingcheng frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT mojaradmajidrastegar frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT liyanpeng frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT sohnsunghwan frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT mehrabisaeed frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT elayavilliravikumarkomandur frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT yuyue frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing
AT liuhongfang frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing