Cargando…

Evaluating global and local sequence alignment methods for comparing patient medical records

BACKGROUND: Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to ide...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Ming, Shah, Nilay D., Yao, Lixia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6921442/
https://www.ncbi.nlm.nih.gov/pubmed/31856819
http://dx.doi.org/10.1186/s12911-019-0965-y
_version_ 1783481163186176000
author Huang, Ming
Shah, Nilay D.
Yao, Lixia
author_facet Huang, Ming
Shah, Nilay D.
Yao, Lixia
author_sort Huang, Ming
collection PubMed
description BACKGROUND: Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to identify patients of similar disease trajectory for more relevant and precise prognosis, diagnosis and treatment of patients. METHODS: We tested two cutting-edge global sequence alignment methods, namely dynamic time warping (DTW) and Needleman-Wunsch algorithm (NWA), together with their local modifications, DTW for Local alignment (DTWL) and Smith-Waterman algorithm (SWA), for aligning patient medical records. We also used 4 sets of synthetic patient medical records generated from a large real-world EHR database as gold standard data, to objectively evaluate these sequence alignment algorithms. RESULTS: For global sequence alignments, 47 out of 80 DTW alignments and 11 out of 80 NWA alignments had superior similarity scores than reference alignments while the rest 33 DTW alignments and 69 NWA alignments had the same similarity scores as reference alignments. Forty-six out of 80 DTW alignments had better similarity scores than NWA alignments with the rest 34 cases having the equal similarity scores from both algorithms. For local sequence alignments, 70 out of 80 DTWL alignments and 68 out of 80 SWA alignments had larger coverage and higher similarity scores than reference alignments while the rest DTWL alignments and SWA alignments received the same coverage and similarity scores as reference alignments. Six out of 80 DTWL alignments showed larger coverage and higher similarity scores than SWA alignments. Thirty DTWL alignments had the equal coverage but better similarity scores than SWA. DTWL and SWA received the equal coverage and similarity scores for the rest 44 cases. CONCLUSIONS: DTW, NWA, DTWL and SWA outperformed the reference alignments. DTW (or DTWL) seems to align better than NWA (or SWA) by inserting new daily events and identifying more similarities between patient medical records. The evaluation results could provide valuable information on the strengths and weakness of these sequence alignment methods for future development of sequence alignment methods and patient similarity-based studies.
format Online
Article
Text
id pubmed-6921442
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69214422019-12-30 Evaluating global and local sequence alignment methods for comparing patient medical records Huang, Ming Shah, Nilay D. Yao, Lixia BMC Med Inform Decis Mak Research BACKGROUND: Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to identify patients of similar disease trajectory for more relevant and precise prognosis, diagnosis and treatment of patients. METHODS: We tested two cutting-edge global sequence alignment methods, namely dynamic time warping (DTW) and Needleman-Wunsch algorithm (NWA), together with their local modifications, DTW for Local alignment (DTWL) and Smith-Waterman algorithm (SWA), for aligning patient medical records. We also used 4 sets of synthetic patient medical records generated from a large real-world EHR database as gold standard data, to objectively evaluate these sequence alignment algorithms. RESULTS: For global sequence alignments, 47 out of 80 DTW alignments and 11 out of 80 NWA alignments had superior similarity scores than reference alignments while the rest 33 DTW alignments and 69 NWA alignments had the same similarity scores as reference alignments. Forty-six out of 80 DTW alignments had better similarity scores than NWA alignments with the rest 34 cases having the equal similarity scores from both algorithms. For local sequence alignments, 70 out of 80 DTWL alignments and 68 out of 80 SWA alignments had larger coverage and higher similarity scores than reference alignments while the rest DTWL alignments and SWA alignments received the same coverage and similarity scores as reference alignments. Six out of 80 DTWL alignments showed larger coverage and higher similarity scores than SWA alignments. Thirty DTWL alignments had the equal coverage but better similarity scores than SWA. DTWL and SWA received the equal coverage and similarity scores for the rest 44 cases. CONCLUSIONS: DTW, NWA, DTWL and SWA outperformed the reference alignments. DTW (or DTWL) seems to align better than NWA (or SWA) by inserting new daily events and identifying more similarities between patient medical records. The evaluation results could provide valuable information on the strengths and weakness of these sequence alignment methods for future development of sequence alignment methods and patient similarity-based studies. BioMed Central 2019-12-19 /pmc/articles/PMC6921442/ /pubmed/31856819 http://dx.doi.org/10.1186/s12911-019-0965-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Huang, Ming
Shah, Nilay D.
Yao, Lixia
Evaluating global and local sequence alignment methods for comparing patient medical records
title Evaluating global and local sequence alignment methods for comparing patient medical records
title_full Evaluating global and local sequence alignment methods for comparing patient medical records
title_fullStr Evaluating global and local sequence alignment methods for comparing patient medical records
title_full_unstemmed Evaluating global and local sequence alignment methods for comparing patient medical records
title_short Evaluating global and local sequence alignment methods for comparing patient medical records
title_sort evaluating global and local sequence alignment methods for comparing patient medical records
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6921442/
https://www.ncbi.nlm.nih.gov/pubmed/31856819
http://dx.doi.org/10.1186/s12911-019-0965-y
work_keys_str_mv AT huangming evaluatingglobalandlocalsequencealignmentmethodsforcomparingpatientmedicalrecords
AT shahnilayd evaluatingglobalandlocalsequencealignmentmethodsforcomparingpatientmedicalrecords
AT yaolixia evaluatingglobalandlocalsequencealignmentmethodsforcomparingpatientmedicalrecords