Cargando…
Evaluating global and local sequence alignment methods for comparing patient medical records
BACKGROUND: Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to ide...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6921442/ https://www.ncbi.nlm.nih.gov/pubmed/31856819 http://dx.doi.org/10.1186/s12911-019-0965-y |
_version_ | 1783481163186176000 |
---|---|
author | Huang, Ming Shah, Nilay D. Yao, Lixia |
author_facet | Huang, Ming Shah, Nilay D. Yao, Lixia |
author_sort | Huang, Ming |
collection | PubMed |
description | BACKGROUND: Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to identify patients of similar disease trajectory for more relevant and precise prognosis, diagnosis and treatment of patients. METHODS: We tested two cutting-edge global sequence alignment methods, namely dynamic time warping (DTW) and Needleman-Wunsch algorithm (NWA), together with their local modifications, DTW for Local alignment (DTWL) and Smith-Waterman algorithm (SWA), for aligning patient medical records. We also used 4 sets of synthetic patient medical records generated from a large real-world EHR database as gold standard data, to objectively evaluate these sequence alignment algorithms. RESULTS: For global sequence alignments, 47 out of 80 DTW alignments and 11 out of 80 NWA alignments had superior similarity scores than reference alignments while the rest 33 DTW alignments and 69 NWA alignments had the same similarity scores as reference alignments. Forty-six out of 80 DTW alignments had better similarity scores than NWA alignments with the rest 34 cases having the equal similarity scores from both algorithms. For local sequence alignments, 70 out of 80 DTWL alignments and 68 out of 80 SWA alignments had larger coverage and higher similarity scores than reference alignments while the rest DTWL alignments and SWA alignments received the same coverage and similarity scores as reference alignments. Six out of 80 DTWL alignments showed larger coverage and higher similarity scores than SWA alignments. Thirty DTWL alignments had the equal coverage but better similarity scores than SWA. DTWL and SWA received the equal coverage and similarity scores for the rest 44 cases. CONCLUSIONS: DTW, NWA, DTWL and SWA outperformed the reference alignments. DTW (or DTWL) seems to align better than NWA (or SWA) by inserting new daily events and identifying more similarities between patient medical records. The evaluation results could provide valuable information on the strengths and weakness of these sequence alignment methods for future development of sequence alignment methods and patient similarity-based studies. |
format | Online Article Text |
id | pubmed-6921442 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69214422019-12-30 Evaluating global and local sequence alignment methods for comparing patient medical records Huang, Ming Shah, Nilay D. Yao, Lixia BMC Med Inform Decis Mak Research BACKGROUND: Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to identify patients of similar disease trajectory for more relevant and precise prognosis, diagnosis and treatment of patients. METHODS: We tested two cutting-edge global sequence alignment methods, namely dynamic time warping (DTW) and Needleman-Wunsch algorithm (NWA), together with their local modifications, DTW for Local alignment (DTWL) and Smith-Waterman algorithm (SWA), for aligning patient medical records. We also used 4 sets of synthetic patient medical records generated from a large real-world EHR database as gold standard data, to objectively evaluate these sequence alignment algorithms. RESULTS: For global sequence alignments, 47 out of 80 DTW alignments and 11 out of 80 NWA alignments had superior similarity scores than reference alignments while the rest 33 DTW alignments and 69 NWA alignments had the same similarity scores as reference alignments. Forty-six out of 80 DTW alignments had better similarity scores than NWA alignments with the rest 34 cases having the equal similarity scores from both algorithms. For local sequence alignments, 70 out of 80 DTWL alignments and 68 out of 80 SWA alignments had larger coverage and higher similarity scores than reference alignments while the rest DTWL alignments and SWA alignments received the same coverage and similarity scores as reference alignments. Six out of 80 DTWL alignments showed larger coverage and higher similarity scores than SWA alignments. Thirty DTWL alignments had the equal coverage but better similarity scores than SWA. DTWL and SWA received the equal coverage and similarity scores for the rest 44 cases. CONCLUSIONS: DTW, NWA, DTWL and SWA outperformed the reference alignments. DTW (or DTWL) seems to align better than NWA (or SWA) by inserting new daily events and identifying more similarities between patient medical records. The evaluation results could provide valuable information on the strengths and weakness of these sequence alignment methods for future development of sequence alignment methods and patient similarity-based studies. BioMed Central 2019-12-19 /pmc/articles/PMC6921442/ /pubmed/31856819 http://dx.doi.org/10.1186/s12911-019-0965-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Huang, Ming Shah, Nilay D. Yao, Lixia Evaluating global and local sequence alignment methods for comparing patient medical records |
title | Evaluating global and local sequence alignment methods for comparing patient medical records |
title_full | Evaluating global and local sequence alignment methods for comparing patient medical records |
title_fullStr | Evaluating global and local sequence alignment methods for comparing patient medical records |
title_full_unstemmed | Evaluating global and local sequence alignment methods for comparing patient medical records |
title_short | Evaluating global and local sequence alignment methods for comparing patient medical records |
title_sort | evaluating global and local sequence alignment methods for comparing patient medical records |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6921442/ https://www.ncbi.nlm.nih.gov/pubmed/31856819 http://dx.doi.org/10.1186/s12911-019-0965-y |
work_keys_str_mv | AT huangming evaluatingglobalandlocalsequencealignmentmethodsforcomparingpatientmedicalrecords AT shahnilayd evaluatingglobalandlocalsequencealignmentmethodsforcomparingpatientmedicalrecords AT yaolixia evaluatingglobalandlocalsequencealignmentmethodsforcomparingpatientmedicalrecords |