Cargando…

Occupation Coding of Job Titles: Iterative Development of an Automated Coding Algorithm for the Canadian National Occupation Classification (ACA-NOC)

BACKGROUND: In many research studies, the identification of social determinants is an important activity, in particular, information about occupations is frequently added to existing patient data. Such information is usually solicited during interviews with open-ended questions such as “What is your...

Descripción completa

Detalles Bibliográficos
Autores principales: Bao, Hongchang, Baker, Christopher J O, Adisesh, Anil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7439137/
https://www.ncbi.nlm.nih.gov/pubmed/32755893
http://dx.doi.org/10.2196/16422
_version_ 1783572920906285056
author Bao, Hongchang
Baker, Christopher J O
Adisesh, Anil
author_facet Bao, Hongchang
Baker, Christopher J O
Adisesh, Anil
author_sort Bao, Hongchang
collection PubMed
description BACKGROUND: In many research studies, the identification of social determinants is an important activity, in particular, information about occupations is frequently added to existing patient data. Such information is usually solicited during interviews with open-ended questions such as “What is your job?” and “What industry sector do you work in?” Before being able to use this information for further analysis, the responses need to be categorized using a coding system, such as the Canadian National Occupational Classification (NOC). Manual coding is the usual method, which is a time-consuming and error-prone activity, suitable for automation. OBJECTIVE: This study aims to facilitate automated coding by introducing a rigorous algorithm that will be able to identify the NOC (2016) codes using only job title and industry information as input. Using manually coded data sets, we sought to benchmark and iteratively improve the performance of the algorithm. METHODS: We developed the ACA-NOC algorithm based on the NOC (2016), which allowed users to match NOC codes with job and industry titles. We employed several different search strategies in the ACA-NOC algorithm to find the best match, including exact search, minor exact search, like search, near (same order) search, near (different order) search, any search, and weak match search. In addition, a filtering step based on the hierarchical structure of the NOC data was applied to the algorithm to select the best matching codes. RESULTS: The ACA-NOC was applied to over 500 manually coded job and industry titles. The accuracy rate at the four-digit NOC code level was 58.7% (332/566) and improved when broader job categories were considered (65.0% at the three-digit NOC code level, 72.3% at the two-digit NOC code level, and 81.6% at the one-digit NOC code level). CONCLUSIONS: The ACA-NOC is a rigorous algorithm for automatically coding the Canadian NOC system and has been evaluated using real-world data. It allows researchers to code moderate-sized data sets with occupation in a timely and cost-efficient manner such that further analytics are possible. Initial assessments indicate that it has state-of-the-art performance and is readily extensible upon further benchmarking on larger data sets.
format Online
Article
Text
id pubmed-7439137
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-74391372020-08-31 Occupation Coding of Job Titles: Iterative Development of an Automated Coding Algorithm for the Canadian National Occupation Classification (ACA-NOC) Bao, Hongchang Baker, Christopher J O Adisesh, Anil JMIR Form Res Original Paper BACKGROUND: In many research studies, the identification of social determinants is an important activity, in particular, information about occupations is frequently added to existing patient data. Such information is usually solicited during interviews with open-ended questions such as “What is your job?” and “What industry sector do you work in?” Before being able to use this information for further analysis, the responses need to be categorized using a coding system, such as the Canadian National Occupational Classification (NOC). Manual coding is the usual method, which is a time-consuming and error-prone activity, suitable for automation. OBJECTIVE: This study aims to facilitate automated coding by introducing a rigorous algorithm that will be able to identify the NOC (2016) codes using only job title and industry information as input. Using manually coded data sets, we sought to benchmark and iteratively improve the performance of the algorithm. METHODS: We developed the ACA-NOC algorithm based on the NOC (2016), which allowed users to match NOC codes with job and industry titles. We employed several different search strategies in the ACA-NOC algorithm to find the best match, including exact search, minor exact search, like search, near (same order) search, near (different order) search, any search, and weak match search. In addition, a filtering step based on the hierarchical structure of the NOC data was applied to the algorithm to select the best matching codes. RESULTS: The ACA-NOC was applied to over 500 manually coded job and industry titles. The accuracy rate at the four-digit NOC code level was 58.7% (332/566) and improved when broader job categories were considered (65.0% at the three-digit NOC code level, 72.3% at the two-digit NOC code level, and 81.6% at the one-digit NOC code level). CONCLUSIONS: The ACA-NOC is a rigorous algorithm for automatically coding the Canadian NOC system and has been evaluated using real-world data. It allows researchers to code moderate-sized data sets with occupation in a timely and cost-efficient manner such that further analytics are possible. Initial assessments indicate that it has state-of-the-art performance and is readily extensible upon further benchmarking on larger data sets. JMIR Publications 2020-08-05 /pmc/articles/PMC7439137/ /pubmed/32755893 http://dx.doi.org/10.2196/16422 Text en ©Hongchang Bao, Christopher J O Baker, Anil Adisesh. Originally published in JMIR Formative Research (http://formative.jmir.org), 05.08.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on http://formative.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Bao, Hongchang
Baker, Christopher J O
Adisesh, Anil
Occupation Coding of Job Titles: Iterative Development of an Automated Coding Algorithm for the Canadian National Occupation Classification (ACA-NOC)
title Occupation Coding of Job Titles: Iterative Development of an Automated Coding Algorithm for the Canadian National Occupation Classification (ACA-NOC)
title_full Occupation Coding of Job Titles: Iterative Development of an Automated Coding Algorithm for the Canadian National Occupation Classification (ACA-NOC)
title_fullStr Occupation Coding of Job Titles: Iterative Development of an Automated Coding Algorithm for the Canadian National Occupation Classification (ACA-NOC)
title_full_unstemmed Occupation Coding of Job Titles: Iterative Development of an Automated Coding Algorithm for the Canadian National Occupation Classification (ACA-NOC)
title_short Occupation Coding of Job Titles: Iterative Development of an Automated Coding Algorithm for the Canadian National Occupation Classification (ACA-NOC)
title_sort occupation coding of job titles: iterative development of an automated coding algorithm for the canadian national occupation classification (aca-noc)
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7439137/
https://www.ncbi.nlm.nih.gov/pubmed/32755893
http://dx.doi.org/10.2196/16422
work_keys_str_mv AT baohongchang occupationcodingofjobtitlesiterativedevelopmentofanautomatedcodingalgorithmforthecanadiannationaloccupationclassificationacanoc
AT bakerchristopherjo occupationcodingofjobtitlesiterativedevelopmentofanautomatedcodingalgorithmforthecanadiannationaloccupationclassificationacanoc
AT adiseshanil occupationcodingofjobtitlesiterativedevelopmentofanautomatedcodingalgorithmforthecanadiannationaloccupationclassificationacanoc