Cargando…

HypertenGene: extracting key hypertension genes from biomedical literature with position and automatically-generated template features

BACKGROUND: The genetic factors leading to hypertension have been extensively studied, and large numbers of research papers have been published on the subject. One of hypertension researchers' primary research tasks is to locate key hypertension-related genes in abstracts. However, gathering su...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsai, Richard Tzong-Han, Lai, Po-Ting, Dai, Hong-Jie, Huang, Chi-Hsin, Bow, Yue-Yang, Chang, Yen-Ching, Pan, Wen-Harn, Hsu, Wen-Lian
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2788360/
https://www.ncbi.nlm.nih.gov/pubmed/19958519
http://dx.doi.org/10.1186/1471-2105-10-S15-S9
_version_ 1782174963638206464
author Tsai, Richard Tzong-Han
Lai, Po-Ting
Dai, Hong-Jie
Huang, Chi-Hsin
Bow, Yue-Yang
Chang, Yen-Ching
Pan, Wen-Harn
Hsu, Wen-Lian
author_facet Tsai, Richard Tzong-Han
Lai, Po-Ting
Dai, Hong-Jie
Huang, Chi-Hsin
Bow, Yue-Yang
Chang, Yen-Ching
Pan, Wen-Harn
Hsu, Wen-Lian
author_sort Tsai, Richard Tzong-Han
collection PubMed
description BACKGROUND: The genetic factors leading to hypertension have been extensively studied, and large numbers of research papers have been published on the subject. One of hypertension researchers' primary research tasks is to locate key hypertension-related genes in abstracts. However, gathering such information with existing tools is not easy: (1) Searching for articles often returns far too many hits to browse through. (2) The search results do not highlight the hypertension-related genes discovered in the abstract. (3) Even though some text mining services mark up gene names in the abstract, the key genes investigated in a paper are still not distinguished from other genes. To facilitate the information gathering process for hypertension researchers, one solution would be to extract the key hypertension-related genes in each abstract. Three major tasks are involved in the construction of this system: (1) gene and hypertension named entity recognition, (2) section categorization, and (3) gene-hypertension relation extraction. RESULTS: We first compare the retrieval performance achieved by individually adding template features and position features to the baseline system. Then, the combination of both is examined. We found that using position features can almost double the original AUC score (0.8140vs.0.4936) of the baseline system. However, adding template features only results in marginal improvement (0.0197). Including both improves AUC to 0.8184, indicating that these two sets of features are complementary, and do not have overlapping effects. We then examine the performance in a different domain--diabetes, and the result shows a satisfactory AUC of 0.83. CONCLUSION: Our approach successfully exploits template features to recognize true hypertension-related gene mentions and position features to distinguish key genes from other related genes. Templates are automatically generated and checked by biologists to minimize labor costs. Our approach integrates the advantages of machine learning models and pattern matching. To the best of our knowledge, this the first systematic study of extracting hypertension-related genes and the first attempt to create a hypertension-gene relation corpus based on the GAD database. Furthermore, our paper proposes and tests novel features for extracting key hypertension genes, such as relative position, section, and template features, which could also be applied to key-gene extraction for other diseases.
format Text
id pubmed-2788360
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27883602009-12-04 HypertenGene: extracting key hypertension genes from biomedical literature with position and automatically-generated template features Tsai, Richard Tzong-Han Lai, Po-Ting Dai, Hong-Jie Huang, Chi-Hsin Bow, Yue-Yang Chang, Yen-Ching Pan, Wen-Harn Hsu, Wen-Lian BMC Bioinformatics Proceedings BACKGROUND: The genetic factors leading to hypertension have been extensively studied, and large numbers of research papers have been published on the subject. One of hypertension researchers' primary research tasks is to locate key hypertension-related genes in abstracts. However, gathering such information with existing tools is not easy: (1) Searching for articles often returns far too many hits to browse through. (2) The search results do not highlight the hypertension-related genes discovered in the abstract. (3) Even though some text mining services mark up gene names in the abstract, the key genes investigated in a paper are still not distinguished from other genes. To facilitate the information gathering process for hypertension researchers, one solution would be to extract the key hypertension-related genes in each abstract. Three major tasks are involved in the construction of this system: (1) gene and hypertension named entity recognition, (2) section categorization, and (3) gene-hypertension relation extraction. RESULTS: We first compare the retrieval performance achieved by individually adding template features and position features to the baseline system. Then, the combination of both is examined. We found that using position features can almost double the original AUC score (0.8140vs.0.4936) of the baseline system. However, adding template features only results in marginal improvement (0.0197). Including both improves AUC to 0.8184, indicating that these two sets of features are complementary, and do not have overlapping effects. We then examine the performance in a different domain--diabetes, and the result shows a satisfactory AUC of 0.83. CONCLUSION: Our approach successfully exploits template features to recognize true hypertension-related gene mentions and position features to distinguish key genes from other related genes. Templates are automatically generated and checked by biologists to minimize labor costs. Our approach integrates the advantages of machine learning models and pattern matching. To the best of our knowledge, this the first systematic study of extracting hypertension-related genes and the first attempt to create a hypertension-gene relation corpus based on the GAD database. Furthermore, our paper proposes and tests novel features for extracting key hypertension genes, such as relative position, section, and template features, which could also be applied to key-gene extraction for other diseases. BioMed Central 2009-12-03 /pmc/articles/PMC2788360/ /pubmed/19958519 http://dx.doi.org/10.1186/1471-2105-10-S15-S9 Text en Copyright © 2009 Tsai et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Tsai, Richard Tzong-Han
Lai, Po-Ting
Dai, Hong-Jie
Huang, Chi-Hsin
Bow, Yue-Yang
Chang, Yen-Ching
Pan, Wen-Harn
Hsu, Wen-Lian
HypertenGene: extracting key hypertension genes from biomedical literature with position and automatically-generated template features
title HypertenGene: extracting key hypertension genes from biomedical literature with position and automatically-generated template features
title_full HypertenGene: extracting key hypertension genes from biomedical literature with position and automatically-generated template features
title_fullStr HypertenGene: extracting key hypertension genes from biomedical literature with position and automatically-generated template features
title_full_unstemmed HypertenGene: extracting key hypertension genes from biomedical literature with position and automatically-generated template features
title_short HypertenGene: extracting key hypertension genes from biomedical literature with position and automatically-generated template features
title_sort hypertengene: extracting key hypertension genes from biomedical literature with position and automatically-generated template features
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2788360/
https://www.ncbi.nlm.nih.gov/pubmed/19958519
http://dx.doi.org/10.1186/1471-2105-10-S15-S9
work_keys_str_mv AT tsairichardtzonghan hypertengeneextractingkeyhypertensiongenesfrombiomedicalliteraturewithpositionandautomaticallygeneratedtemplatefeatures
AT laipoting hypertengeneextractingkeyhypertensiongenesfrombiomedicalliteraturewithpositionandautomaticallygeneratedtemplatefeatures
AT daihongjie hypertengeneextractingkeyhypertensiongenesfrombiomedicalliteraturewithpositionandautomaticallygeneratedtemplatefeatures
AT huangchihsin hypertengeneextractingkeyhypertensiongenesfrombiomedicalliteraturewithpositionandautomaticallygeneratedtemplatefeatures
AT bowyueyang hypertengeneextractingkeyhypertensiongenesfrombiomedicalliteraturewithpositionandautomaticallygeneratedtemplatefeatures
AT changyenching hypertengeneextractingkeyhypertensiongenesfrombiomedicalliteraturewithpositionandautomaticallygeneratedtemplatefeatures
AT panwenharn hypertengeneextractingkeyhypertensiongenesfrombiomedicalliteraturewithpositionandautomaticallygeneratedtemplatefeatures
AT hsuwenlian hypertengeneextractingkeyhypertensiongenesfrombiomedicalliteraturewithpositionandautomaticallygeneratedtemplatefeatures