Cargando…

DNorm: disease name normalization with pairwise learning to rank

Motivation: Despite the central role of diseases in biomedical research, there have been much fewer attempts to automatically determine which diseases are mentioned in a text—the task of disease name normalization (DNorm)—compared with other normalization tasks in biomedical text mining research. Me...

Descripción completa

Detalles Bibliográficos
Autores principales:	Leaman, Robert, Islamaj Doğan, Rezarta, Lu, Zhiyong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2013
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3810844/ https://www.ncbi.nlm.nih.gov/pubmed/23969135 http://dx.doi.org/10.1093/bioinformatics/btt474

_version_	1782288860755001344
author	Leaman, Robert Islamaj Doğan, Rezarta Lu, Zhiyong
author_facet	Leaman, Robert Islamaj Doğan, Rezarta Lu, Zhiyong
author_sort	Leaman, Robert
collection	PubMed
description	Motivation: Despite the central role of diseases in biomedical research, there have been much fewer attempts to automatically determine which diseases are mentioned in a text—the task of disease name normalization (DNorm)—compared with other normalization tasks in biomedical text mining research. Methods: In this article we introduce the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH® and OMIM. Our method is a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data. The technique is based on pairwise learning to rank, which has not previously been applied to the normalization task but has proven successful in large optimization problems for information retrieval. Results: We compare our method with several techniques based on lexical normalization and matching, MetaMap and Lucene. Our algorithm achieves 0.782 micro-averaged F-measure and 0.809 macro-averaged F-measure, an increase over the highest performing baseline method of 0.121 and 0.098, respectively. Availability: The source code for DNorm is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/DNorm, along with a web-based demonstration and links to the NCBI disease corpus. Results on PubMed abstracts are available in PubTator: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator Contact: zhiyong.lu@nih.gov
format	Online Article Text
id	pubmed-3810844
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-38108442013-10-29 DNorm: disease name normalization with pairwise learning to rank Leaman, Robert Islamaj Doğan, Rezarta Lu, Zhiyong Bioinformatics Original Papers Motivation: Despite the central role of diseases in biomedical research, there have been much fewer attempts to automatically determine which diseases are mentioned in a text—the task of disease name normalization (DNorm)—compared with other normalization tasks in biomedical text mining research. Methods: In this article we introduce the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH® and OMIM. Our method is a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data. The technique is based on pairwise learning to rank, which has not previously been applied to the normalization task but has proven successful in large optimization problems for information retrieval. Results: We compare our method with several techniques based on lexical normalization and matching, MetaMap and Lucene. Our algorithm achieves 0.782 micro-averaged F-measure and 0.809 macro-averaged F-measure, an increase over the highest performing baseline method of 0.121 and 0.098, respectively. Availability: The source code for DNorm is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/DNorm, along with a web-based demonstration and links to the NCBI disease corpus. Results on PubMed abstracts are available in PubTator: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator Contact: zhiyong.lu@nih.gov Oxford University Press 2013-11-15 2013-08-21 /pmc/articles/PMC3810844/ /pubmed/23969135 http://dx.doi.org/10.1093/bioinformatics/btt474 Text en © The Author 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Leaman, Robert Islamaj Doğan, Rezarta Lu, Zhiyong DNorm: disease name normalization with pairwise learning to rank
title	DNorm: disease name normalization with pairwise learning to rank
title_full	DNorm: disease name normalization with pairwise learning to rank
title_fullStr	DNorm: disease name normalization with pairwise learning to rank
title_full_unstemmed	DNorm: disease name normalization with pairwise learning to rank
title_short	DNorm: disease name normalization with pairwise learning to rank
title_sort	dnorm: disease name normalization with pairwise learning to rank
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3810844/ https://www.ncbi.nlm.nih.gov/pubmed/23969135 http://dx.doi.org/10.1093/bioinformatics/btt474
work_keys_str_mv	AT leamanrobert dnormdiseasenamenormalizationwithpairwiselearningtorank AT islamajdoganrezarta dnormdiseasenamenormalizationwithpairwiselearningtorank AT luzhiyong dnormdiseasenamenormalizationwithpairwiselearningtorank

DNorm: disease name normalization with pairwise learning to rank

Ejemplares similares