Cargando…
A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing
In clinical NLP, one major barrier to adopting crowdsourcing for NLP annotation is the issue of confidentiality for protected health information (PHI) in clinical narratives. In this paper, we investigated the use of a frequency-based approach to extract sentences without PHI. Our approach is based...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5859924/ https://www.ncbi.nlm.nih.gov/pubmed/26262333 |
_version_ | 1783307927403102208 |
---|---|
author | Li, Dingcheng Mojarad, Majid Rastegar Li, Yanpeng Sohn, Sunghwan Mehrabi, Saeed Elayavilli, Ravikumar Komandur Yu, Yue Liu, Hongfang |
author_facet | Li, Dingcheng Mojarad, Majid Rastegar Li, Yanpeng Sohn, Sunghwan Mehrabi, Saeed Elayavilli, Ravikumar Komandur Yu, Yue Liu, Hongfang |
author_sort | Li, Dingcheng |
collection | PubMed |
description | In clinical NLP, one major barrier to adopting crowdsourcing for NLP annotation is the issue of confidentiality for protected health information (PHI) in clinical narratives. In this paper, we investigated the use of a frequency-based approach to extract sentences without PHI. Our approach is based on the assumption that sentences appearing frequently tend to contain no PHI. Both manual and automatic evaluations on 500 sentences out of the 7.9 million sentences of frequencies higher than one show that no PHI can be found among them. The promising results provide potentials of releasing those sentences for obtaining sentence-level NLP annotations via crowdsourcing. |
format | Online Article Text |
id | pubmed-5859924 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
record_format | MEDLINE/PubMed |
spelling | pubmed-58599242018-03-20 A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing Li, Dingcheng Mojarad, Majid Rastegar Li, Yanpeng Sohn, Sunghwan Mehrabi, Saeed Elayavilli, Ravikumar Komandur Yu, Yue Liu, Hongfang Stud Health Technol Inform Article In clinical NLP, one major barrier to adopting crowdsourcing for NLP annotation is the issue of confidentiality for protected health information (PHI) in clinical narratives. In this paper, we investigated the use of a frequency-based approach to extract sentences without PHI. Our approach is based on the assumption that sentences appearing frequently tend to contain no PHI. Both manual and automatic evaluations on 500 sentences out of the 7.9 million sentences of frequencies higher than one show that no PHI can be found among them. The promising results provide potentials of releasing those sentences for obtaining sentence-level NLP annotations via crowdsourcing. 2015 /pmc/articles/PMC5859924/ /pubmed/26262333 Text en This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc-nd/4.0/) . |
spellingShingle | Article Li, Dingcheng Mojarad, Majid Rastegar Li, Yanpeng Sohn, Sunghwan Mehrabi, Saeed Elayavilli, Ravikumar Komandur Yu, Yue Liu, Hongfang A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing |
title | A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing |
title_full | A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing |
title_fullStr | A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing |
title_full_unstemmed | A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing |
title_short | A Frequency-based Strategy of Obtaining Sentences from Clinical Data Repository for Crowdsourcing |
title_sort | frequency-based strategy of obtaining sentences from clinical data repository for crowdsourcing |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5859924/ https://www.ncbi.nlm.nih.gov/pubmed/26262333 |
work_keys_str_mv | AT lidingcheng afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT mojaradmajidrastegar afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT liyanpeng afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT sohnsunghwan afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT mehrabisaeed afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT elayavilliravikumarkomandur afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT yuyue afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT liuhongfang afrequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT lidingcheng frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT mojaradmajidrastegar frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT liyanpeng frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT sohnsunghwan frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT mehrabisaeed frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT elayavilliravikumarkomandur frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT yuyue frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing AT liuhongfang frequencybasedstrategyofobtainingsentencesfromclinicaldatarepositoryforcrowdsourcing |