Cargando…

RLT-S: A Web System for Record Linkage

BACKGROUND: Record linkage integrates records across multiple related data sources identifying duplicates and accounting for possible errors. Real life applications require efficient algorithms to merge these voluminous data sources to find out all records belonging to same individuals. Our recently...

Descripción completa

Detalles Bibliográficos
Autores principales: Mamun, Abdullah-Al, Aseltine, Robert, Rajasekaran, Sanguthevar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4420456/
https://www.ncbi.nlm.nih.gov/pubmed/25942687
http://dx.doi.org/10.1371/journal.pone.0124449
_version_ 1782369729835433984
author Mamun, Abdullah-Al
Aseltine, Robert
Rajasekaran, Sanguthevar
author_facet Mamun, Abdullah-Al
Aseltine, Robert
Rajasekaran, Sanguthevar
author_sort Mamun, Abdullah-Al
collection PubMed
description BACKGROUND: Record linkage integrates records across multiple related data sources identifying duplicates and accounting for possible errors. Real life applications require efficient algorithms to merge these voluminous data sources to find out all records belonging to same individuals. Our recently devised highly efficient record linkage algorithms provide best-known solutions to this challenging problem. METHOD: We have developed RLT-S, a freely available web tool, which implements our single linkage clustering algorithm for record linkage. This tool requires input data sets and a small set of configuration settings about these files to work efficiently. RLT-S employs exact match clustering, blocking on a specified attribute and single linkage based hierarchical clustering among these blocks. RESULTS: RLT-S is an implementation package of our sequential record linkage algorithm. It outperforms previous best-known implementations by a large margin. The tool is at least two times faster for any dataset than the previous best-known tools. CONCLUSIONS: RLT-S tool implements our record linkage algorithm that outperforms previous best-known algorithms in this area. This website also contains necessary information such as instructions, submission history, feedback, publications and some other sections to facilitate the usage of the tool. AVAILABILITY: RLT-S is integrated into http://www.rlatools.com, which is currently serving this tool only. The tool is freely available and can be used without login. All data files used in this paper have been stored in https://github.com/abdullah009/DataRLATools. For copies of the relevant programs please see https://github.com/abdullah009/RLATools.
format Online
Article
Text
id pubmed-4420456
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44204562015-05-12 RLT-S: A Web System for Record Linkage Mamun, Abdullah-Al Aseltine, Robert Rajasekaran, Sanguthevar PLoS One Research Article BACKGROUND: Record linkage integrates records across multiple related data sources identifying duplicates and accounting for possible errors. Real life applications require efficient algorithms to merge these voluminous data sources to find out all records belonging to same individuals. Our recently devised highly efficient record linkage algorithms provide best-known solutions to this challenging problem. METHOD: We have developed RLT-S, a freely available web tool, which implements our single linkage clustering algorithm for record linkage. This tool requires input data sets and a small set of configuration settings about these files to work efficiently. RLT-S employs exact match clustering, blocking on a specified attribute and single linkage based hierarchical clustering among these blocks. RESULTS: RLT-S is an implementation package of our sequential record linkage algorithm. It outperforms previous best-known implementations by a large margin. The tool is at least two times faster for any dataset than the previous best-known tools. CONCLUSIONS: RLT-S tool implements our record linkage algorithm that outperforms previous best-known algorithms in this area. This website also contains necessary information such as instructions, submission history, feedback, publications and some other sections to facilitate the usage of the tool. AVAILABILITY: RLT-S is integrated into http://www.rlatools.com, which is currently serving this tool only. The tool is freely available and can be used without login. All data files used in this paper have been stored in https://github.com/abdullah009/DataRLATools. For copies of the relevant programs please see https://github.com/abdullah009/RLATools. Public Library of Science 2015-05-05 /pmc/articles/PMC4420456/ /pubmed/25942687 http://dx.doi.org/10.1371/journal.pone.0124449 Text en © 2015 Mamun et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Mamun, Abdullah-Al
Aseltine, Robert
Rajasekaran, Sanguthevar
RLT-S: A Web System for Record Linkage
title RLT-S: A Web System for Record Linkage
title_full RLT-S: A Web System for Record Linkage
title_fullStr RLT-S: A Web System for Record Linkage
title_full_unstemmed RLT-S: A Web System for Record Linkage
title_short RLT-S: A Web System for Record Linkage
title_sort rlt-s: a web system for record linkage
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4420456/
https://www.ncbi.nlm.nih.gov/pubmed/25942687
http://dx.doi.org/10.1371/journal.pone.0124449
work_keys_str_mv AT mamunabdullahal rltsawebsystemforrecordlinkage
AT aseltinerobert rltsawebsystemforrecordlinkage
AT rajasekaransanguthevar rltsawebsystemforrecordlinkage