Cargando…
OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature
BACKGROUND: Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the var...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2277400/ https://www.ncbi.nlm.nih.gov/pubmed/18251998 http://dx.doi.org/10.1186/1471-2105-9-84 |
_version_ | 1782152021886894080 |
---|---|
author | Furlong, Laura I Dach, Holger Hofmann-Apitius, Martin Sanz, Ferran |
author_facet | Furlong, Laura I Dach, Holger Hofmann-Apitius, Martin Sanz, Ferran |
author_sort | Furlong, Laura I |
collection | PubMed |
description | BACKGROUND: Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required. RESULTS: Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes . Here we describe the development of a new version of OSIRIS (OSIRISv1.2, ) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented. CONCLUSION: OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases. |
format | Text |
id | pubmed-2277400 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-22774002008-04-01 OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature Furlong, Laura I Dach, Holger Hofmann-Apitius, Martin Sanz, Ferran BMC Bioinformatics Methodology Article BACKGROUND: Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required. RESULTS: Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes . Here we describe the development of a new version of OSIRIS (OSIRISv1.2, ) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented. CONCLUSION: OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases. BioMed Central 2008-02-05 /pmc/articles/PMC2277400/ /pubmed/18251998 http://dx.doi.org/10.1186/1471-2105-9-84 Text en Copyright © 2008 Furlong et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Furlong, Laura I Dach, Holger Hofmann-Apitius, Martin Sanz, Ferran OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature |
title | OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature |
title_full | OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature |
title_fullStr | OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature |
title_full_unstemmed | OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature |
title_short | OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature |
title_sort | osirisv1.2: a named entity recognition system for sequence variants of genes in biomedical literature |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2277400/ https://www.ncbi.nlm.nih.gov/pubmed/18251998 http://dx.doi.org/10.1186/1471-2105-9-84 |
work_keys_str_mv | AT furlonglaurai osirisv12anamedentityrecognitionsystemforsequencevariantsofgenesinbiomedicalliterature AT dachholger osirisv12anamedentityrecognitionsystemforsequencevariantsofgenesinbiomedicalliterature AT hofmannapitiusmartin osirisv12anamedentityrecognitionsystemforsequencevariantsofgenesinbiomedicalliterature AT sanzferran osirisv12anamedentityrecognitionsystemforsequencevariantsofgenesinbiomedicalliterature |