Cargando…

Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records

BACKGROUND: Information on ethnicity is commonly used by health services and researchers to plan services, ensure equality of access, and for epidemiological studies. In common with other important demographic and clinical data it is often incompletely recorded. This paper presents a method for impu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ryan, Ronan, Vernon, Sally, Lawrence, Gill, Wilson, Sue
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3353229/ https://www.ncbi.nlm.nih.gov/pubmed/22269985 http://dx.doi.org/10.1186/1472-6947-12-3

_version_	1782233016728289280
author	Ryan, Ronan Vernon, Sally Lawrence, Gill Wilson, Sue
author_facet	Ryan, Ronan Vernon, Sally Lawrence, Gill Wilson, Sue
author_sort	Ryan, Ronan
collection	PubMed
description	BACKGROUND: Information on ethnicity is commonly used by health services and researchers to plan services, ensure equality of access, and for epidemiological studies. In common with other important demographic and clinical data it is often incompletely recorded. This paper presents a method for imputing missing data on the ethnicity of cancer patients, developed for a regional cancer registry in the UK. METHODS: Routine records from cancer screening services, name recognition software (Nam Pehchan and Onomap), 2001 national Census data, and multiple imputation were used to predict the ethnicity of the 23% of cases that were still missing following linkage with self-reported ethnicity from inpatient hospital records. RESULTS: The name recognition software were good predictors of ethnicity for South Asian cancer cases when compared with data on ethnicity derived from hospital inpatient records, especially when combined (sensitivity 90.5%; specificity 99.9%; PPV 93.3%). Onomap was a poor predictor of ethnicity for other minority ethnic groups (sensitivity 4.4% for Black cases and 0.0% for Chinese/Other ethnic groups). Area-based data derived from the national Census was also a poor predictor non-White ethnicity (sensitivity: South Asian 7.4%; Black 2.3%; Chinese/Other 0.0%; Mixed 0.0%). CONCLUSIONS: Currently, neither method for assigning individuals to an ethnic group (name recognition and ethnic distribution of area of residence) performs well across all ethnic groups. We recommend further development of name recognition applications and the identification of additional methods for predicting ethnicity to improve their precision and accuracy for comparisons of health outcomes. However, real improvements can only come from better recording of ethnicity by health services.
format	Online Article Text
id	pubmed-3353229
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-33532292012-05-16 Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records Ryan, Ronan Vernon, Sally Lawrence, Gill Wilson, Sue BMC Med Inform Decis Mak Research Article BACKGROUND: Information on ethnicity is commonly used by health services and researchers to plan services, ensure equality of access, and for epidemiological studies. In common with other important demographic and clinical data it is often incompletely recorded. This paper presents a method for imputing missing data on the ethnicity of cancer patients, developed for a regional cancer registry in the UK. METHODS: Routine records from cancer screening services, name recognition software (Nam Pehchan and Onomap), 2001 national Census data, and multiple imputation were used to predict the ethnicity of the 23% of cases that were still missing following linkage with self-reported ethnicity from inpatient hospital records. RESULTS: The name recognition software were good predictors of ethnicity for South Asian cancer cases when compared with data on ethnicity derived from hospital inpatient records, especially when combined (sensitivity 90.5%; specificity 99.9%; PPV 93.3%). Onomap was a poor predictor of ethnicity for other minority ethnic groups (sensitivity 4.4% for Black cases and 0.0% for Chinese/Other ethnic groups). Area-based data derived from the national Census was also a poor predictor non-White ethnicity (sensitivity: South Asian 7.4%; Black 2.3%; Chinese/Other 0.0%; Mixed 0.0%). CONCLUSIONS: Currently, neither method for assigning individuals to an ethnic group (name recognition and ethnic distribution of area of residence) performs well across all ethnic groups. We recommend further development of name recognition applications and the identification of additional methods for predicting ethnicity to improve their precision and accuracy for comparisons of health outcomes. However, real improvements can only come from better recording of ethnicity by health services. BioMed Central 2012-01-23 /pmc/articles/PMC3353229/ /pubmed/22269985 http://dx.doi.org/10.1186/1472-6947-12-3 Text en Copyright ©2012 Ryan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Ryan, Ronan Vernon, Sally Lawrence, Gill Wilson, Sue Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records
title	Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records
title_full	Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records
title_fullStr	Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records
title_full_unstemmed	Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records
title_short	Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records
title_sort	use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3353229/ https://www.ncbi.nlm.nih.gov/pubmed/22269985 http://dx.doi.org/10.1186/1472-6947-12-3
work_keys_str_mv	AT ryanronan useofnamerecognitionsoftwarecensusdataandmultipleimputationtopredictmissingdataonethnicityapplicationtocancerregistryrecords AT vernonsally useofnamerecognitionsoftwarecensusdataandmultipleimputationtopredictmissingdataonethnicityapplicationtocancerregistryrecords AT lawrencegill useofnamerecognitionsoftwarecensusdataandmultipleimputationtopredictmissingdataonethnicityapplicationtocancerregistryrecords AT wilsonsue useofnamerecognitionsoftwarecensusdataandmultipleimputationtopredictmissingdataonethnicityapplicationtocancerregistryrecords

Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records

Ejemplares similares